2.5.67-mm3

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* 2.5.67-mm3
@ 2003-04-14  8:53 Andrew Morton
  2003-04-14 11:03 ` 2.5.67-mm3 Bill Huey
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Andrew Morton @ 2003-04-14  8:53 UTC (permalink / raw)
  To: linux-kernel, linux-mm

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.67/2.5.67-mm3/

A bunch of new fixes, and a framebuffer update.  This should work a bit
better than -mm2.


Changes since 2.5.67-mm2:

-kobject-leak-fix.patch
-radix_tree_delete-api-cleanup.patch
-gen_rtc-compile-fix.patch
-warn-null-wakeup.patch
-remove-flush_page_to_ram.patch
-ext3-quota-deadlock-fix.patch
-dont-clear-PG_uptodate-on-ENOSPC.patch
-stack-protection-fix.patch
-sparc-PTE_FILE_MAX_BITS-fix.patch
-file_lock-spinlock.patch
-bootmem-speedup.patch
-mem_map-init-arch-hooks.patch
-tty-modem-control-api.patch
-kmalloc_sizes-fix.patch
-proc-interrupts-kmalloc-size.patch
-vmalloc-stats.patch
-meminfo-doc.patch
-percpu_counter.patch
-blockgroup_lock.patch
-ext2-no-lock_super-ng.patch
-ext2-ialloc-no-lock_super-ng.patch
-init-sections-in-kallsyms.patch

 Merged

+devclass-oops-workaround.patch

 Work around a kobject oops-on-boot

+ipip_err-compile-fix.patch

 Fix a build problem

 p4-oprofile-fix.patch

 This works now.

+genrtc-jiffies-fix.patch

 Compile warning fix (bugfix on 64-bit machines)

+export-kernel_fpu_begin.patch

 Fix modular RAID build

+tasklist_lock-dcache_lock-inversion-fix.patch

 Fix a deadlock

+vsyscall-unwinding.patch

 Stack unwinding code for vsyscalls

+mce-workqueue-startup-fix.patch

 Should fix a startup oops

+1394-compile-fix.patch

 Fix 1394 build

+nfs-resource-management.patch

 Fix NFS VM conniptions.

+fremap-all-mappings.patch

 Bring this back: it prefaults executable mmaps and speeds up
 application launching.

+lockmeter-fixes.patch

 Fix CONFIG_PREEMPT

+fbdev.patch

 Framebuffer update




All 94 patches

linus.patch

mm.patch
  add -mmN to EXTRAVERSION

kgdb-ga.patch
  kgdb stub for ia32 (George Anzinger's one)

kgdb-ga-warning-fix.patch
  -mm traps.c warning

kgdb-ga-up-warning-fix.patch

devclass-oops-workaround.patch
  work around oops in devclass_add_driver()

ipip_err-compile-fix.patch

p4-oprofile-fix.patch
  Fix oprofile on hyperthreaded P4's

flush_workqueue-hang-fix.patch
  flush_work_queue() fixes

tty-shutdown-race-fix.patch
  fix tty shutdown race

ppa-null-pointer-fix.patch

dmfe-kfree_skb-fix.patch
  dmfe: don't free skb with local interrupts disabled

config_spinline.patch
  uninline spinlocks for profiling accuracy.

ppc64-reloc_hide.patch

ppc64-pci-patch.patch
  Subject: pci patch

ppc64-aio-32bit-emulation.patch
  32/64bit emulation for aio

ppc64-scruffiness.patch
  Fix some PPC64 compile warnings

ppc64-update.patch
  ppc64 update

ppc64-update-fixes.patch

sym-do-160.patch
  make the SYM driver do 160 MB/sec

config-PAGE_OFFSET.patch
  Configurable kenrel/user memory split

buffer-debug.patch
  buffer.c debugging

genrtc-jiffies-fix.patch
  genrtc: jiffies type fix

export-kernel_fpu_begin.patch
  export kernel_fpu_begin() to modules

ext3-truncate-ordered-pages.patch
  ext3: explicitly free truncated pages

reiserfs_file_write-5.patch

sched_idle-typo-fix.patch
  fix sched_idle typo

rcu-stats.patch
  RCU statistics reporting

ext3-journalled-data-assertion-fix.patch
  Remove incorrect assertion from ext3

nfs-speedup.patch

nfs-oom-fix.patch
  nfs oom fix

sk-allocation.patch
  Subject: Re: nfs oom

nfs-more-oom-fix.patch

rpciod-atomic-allocations.patch
  Make rcpiod use atomic allocations

linux-isp.patch

isp-update-1.patch

posix-timer-hang-fix-2.patch
  Posix timer hang fix

tasklist_lock-dcache_lock-inversion-fix.patch
  Fix deadlock between tasklist_lock and dcache_lock

vsyscall-unwinding.patch
  Subject: unwinding for vsyscall code

mce-workqueue-startup-fix.patch
  fix MCE startup ordering problems

setserial-fix.patch
  Subject: [PATCH 2.5] Minor fix for driver/serial/core.c

1394-compile-fix.patch
  Fix iee1394 nodemgr.c compile

nfs-resource-management.patch
  Subject: Resource management for NFS...

kblockd.patch
  Create `kblockd' workqueue

as-iosched.patch
  anticipatory I/O scheduler

as-np-reads-1.patch
  AS: read-vs-read fixes

as-np-reads-2.patch
  AS: more read-vs-read fixes

as-predict-data-direction.patch
  as: predict direction of next IO

as-remove-frontmerge.patch
  AS: remove frontmerge tunable

as-misc-cleanups.patch
  AS: misc cleanups

as-minor-tweaks.patch
  AS: tuning and tweaks

as-remove-stats.patch
  AS: remove statistics

as-locking-fix.patch
  AS: Fix minor race

as-disable-thinktime.patch

as-use-queue_empty.patch
  AS: Use the queue_empty API

cfq-2.patch
  CFQ scheduler, #2

unplug-use-kblockd.patch
  Use kblockd for running request queues

objrmap.patch
  object-based rmap

objrmap-sort-vma-list.patch
  objrmap: optimise per-mapping vma searches

objrmap-vma-sorting-fix.patch
  fix obj vma sorting

fremap-all-mappings.patch
  Make all executable mappings be nonlinear

32bit-dev_t-nfs-export-fix.patch
  Fix nfsd exports with big dev_t

sched-2.5.64-D3.patch
  sched-2.5.64-D3, more interactivity changes

scheduler-tunables.patch
  scheduler tunables

show_task-free-stack-fix.patch
  show_task() fix and cleanup

yellowfin-set_bit-fix.patch
  yellowfin driver set_bit fix

htree-nfs-fix.patch
  Fix ext3 htree / NFS compatibility problems

task_prio-fix.patch
  simple task_prio() fix

i8042-share-irqs.patch
  allow i8042 interrupt sharing

gfp_repeat.patch
  implement __GFP_REPEAT

alloc_buffer_head-take-gfp.patch
  make alloc_buffer_head take gfp_flags

pte_alloc_one-use-gfp_repeat.patch
  use __GFP_REPEAT in pte_alloc_one()

pmd_alloc_one-use-gfp_repeat.patch
  use __GFP_REPEAT in pmd_alloc_one()

dynamic-hd_struct-allocation.patch
  Allocate hd_structs dynamically

dynamic-hd_struct-devfs-fix.patch
  Fix dynamic hd_struct allocation for devfs

slab_store_user-large-objects.patch
  slab debug: perform redzoning against larger objects

htree-nfs-fix-2.patch
  htree nfs fix

put_task_struct-debug.patch

dev_t-32-bit.patch
  [for playing only] change type of dev_t

dev_t-remove-B_FREE.patch
  dev_t: eliminate B_FREE

sg-dev_t-fix.patch
  32-bit dev_t fix for sg

xfs-dev_t-warning-fix.patch
  xfs dev_t printk warning fix

aggregated-disk-stats.patch
  Aggregated disk statistics

oops-dump-preceding-code.patch
  i386 oops output: dump preceding code

usb-disconnect-crash-fix.patch
  Subject: Re: [linux-usb-devel] timer hang with current 2.5 BK

lockmeter.patch

lockmeter-fixes.patch

ext3-no-bkl.patch

journal_dirty_metadata-speedup.patch

journal_get_write_access-speedup.patch

ext3-concurrent-block-inode-allocation.patch
  Subject: [PATCH] concurrent block/inode allocation for EXT3

ext3-concurrent-block-allocation-fix-1.patch

ext3-concurrent-block-allocation-hashed.patch
  Subject: Re: [PATCH] concurrent block/inode allocation for EXT3

fbdev.patch



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-14  8:53 2.5.67-mm3 Andrew Morton
@ 2003-04-14 11:03 ` Bill Huey
  2003-04-14 15:13   ` 2.5.67-mm3 Rudmer van Dijk
  2003-04-15  2:00 ` 2.5.67-mm3 William Lee Irwin III
  2003-04-16  2:21 ` 2.5.67-mm3 William Lee Irwin III
  2 siblings, 1 reply; 21+ messages in thread
From: Bill Huey @ 2003-04-14 11:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Bill Huey (Hui)

On Mon, Apr 14, 2003 at 01:53:13AM -0700, Andrew Morton wrote:
> A bunch of new fixes, and a framebuffer update.  This should work a bit
> better than -mm2.

make -f scripts/Makefile.build obj=arch/i386/boot arch/i386/boot/bzImage
  ld -m elf_i386  -Ttext 0x0 -s --oformat binary -e begtext
  arch/i386/boot/setup.o -o arch/i386/boot/setup 
  arch/i386/boot/setup.o(.text+0x9a4): In function `video':
  /tmp/ccyhvWWu.s:2925: undefined reference to `store_edid'
  make[1]: *** [arch/i386/boot/setup] Error 1
  make: *** [bzImage] Error 2

---------------------------------------

Not sure what's triggering this here.

bill

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-14 11:03 ` 2.5.67-mm3 Bill Huey
@ 2003-04-14 15:13   ` Rudmer van Dijk
  2003-04-15  1:03     ` 2.5.67-mm3 Bill Huey
  0 siblings, 1 reply; 21+ messages in thread
From: Rudmer van Dijk @ 2003-04-14 15:13 UTC (permalink / raw)
  To: Bill Huey (Hui), Andrew Morton; +Cc: linux-kernel, linux-mm

On Monday 14 April 2003 13:03, Bill Huey (Hui) wrote:
> On Mon, Apr 14, 2003 at 01:53:13AM -0700, Andrew Morton wrote:
> > A bunch of new fixes, and a framebuffer update.  This should work a bit
> > better than -mm2.
> 
> make -f scripts/Makefile.build obj=arch/i386/boot arch/i386/boot/bzImage
>   ld -m elf_i386  -Ttext 0x0 -s --oformat binary -e begtext
>   arch/i386/boot/setup.o -o arch/i386/boot/setup 
>   arch/i386/boot/setup.o(.text+0x9a4): In function `video':
>   /tmp/ccyhvWWu.s:2925: undefined reference to `store_edid'
>   make[1]: *** [arch/i386/boot/setup] Error 1
>   make: *** [bzImage] Error 2
> 
> ---------------------------------------

got this also.
store_edid is only used when CONFIG_VIDEO_SELECT is set but the call to it is 
outside the #ifdef...

this patch fixes it. Maybe it is better to move the call to store_edid up 
inside the already avilable #ifdef but I'm not sure if that is possible

	Rudmer

--- linux-2.5.67-mm3/arch/i386/boot/video.S.orig	2003-04-14 
17:07:24.000000000 +0200
+++ linux-2.5.67-mm3/arch/i386/boot/video.S	2003-04-14 17:03:08.000000000 
+0200
@@ -135,7 +135,9 @@
 #endif /* CONFIG_VIDEO_RETAIN */
 #endif /* CONFIG_VIDEO_SELECT */
 	call	mode_params			# Store mode parameters
+#ifdef CONFIG_VIDEO_SELECT
 	call	store_edid
+#endif /* CONFIG_VIDEO_SELECT */
 	popw	%ds				# Restore original DS
 	ret
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-14 15:13   ` 2.5.67-mm3 Rudmer van Dijk
@ 2003-04-15  1:03     ` Bill Huey
  2003-04-15  1:13       ` 2.5.67-mm3 Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: Bill Huey @ 2003-04-15  1:03 UTC (permalink / raw)
  To: Rudmer van Dijk; +Cc: Andrew Morton, linux-kernel, linux-mm, Bill Huey (Hui)

On Mon, Apr 14, 2003 at 05:13:05PM +0200, Rudmer van Dijk wrote:
> this patch fixes it. Maybe it is better to move the call to store_edid up 
> inside the already avilable #ifdef but I'm not sure if that is possible

Now I'm getting console warning "anticipatory scheduler" at boot time
and then having it freeze after mounting root read-only.

bill

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-15  1:03     ` 2.5.67-mm3 Bill Huey
@ 2003-04-15  1:13       ` Andrew Morton
  2003-04-15  1:34         ` 2.5.67-mm3 Bill Huey
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2003-04-15  1:13 UTC (permalink / raw)
  To: Bill Huey (Hui); +Cc: rudmer, linux-kernel, linux-mm

Bill Huey (Hui) <billh@gnuppy.monkey.org> wrote:
>
> On Mon, Apr 14, 2003 at 05:13:05PM +0200, Rudmer van Dijk wrote:
> > this patch fixes it. Maybe it is better to move the call to store_edid up 
> > inside the already avilable #ifdef but I'm not sure if that is possible
> 
> Now I'm getting console warning "anticipatory scheduler" at boot time
> and then having it freeze after mounting root read-only.
> 

Could be anything.   Does sysrq not work?

If not, please send me your .config.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-15  1:13       ` 2.5.67-mm3 Andrew Morton
@ 2003-04-15  1:34         ` Bill Huey
  2003-04-15  9:38           ` 2.5.67-mm3 Rudmer van Dijk
  0 siblings, 1 reply; 21+ messages in thread
From: Bill Huey @ 2003-04-15  1:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: rudmer, linux-kernel, linux-mm, Bill Huey (Hui)

On Mon, Apr 14, 2003 at 06:13:02PM -0700, Andrew Morton wrote:
> Could be anything.   Does sysrq not work?
> 
> If not, please send me your .config.

It does it with elevator=deadline too. I'll see if I can get you better
dump.

bill

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-15  1:34         ` 2.5.67-mm3 Bill Huey
@ 2003-04-15  9:38           ` Rudmer van Dijk
  0 siblings, 0 replies; 21+ messages in thread
From: Rudmer van Dijk @ 2003-04-15  9:38 UTC (permalink / raw)
  To: Bill Huey (Hui), Andrew Morton; +Cc: linux-kernel, linux-mm

On Tuesday 15 April 2003 03:34, Bill Huey (Hui) wrote:
> On Mon, Apr 14, 2003 at 06:13:02PM -0700, Andrew Morton wrote:
> > Could be anything.   Does sysrq not work?
> > 
> > If not, please send me your .config.
> 
> It does it with elevator=deadline too. I'll see if I can get you better
> dump.

no problems here (running with anticipatory scheduling elevator), current 
uptime is 18h. 
the only problem I have is that kmod does not seem to work, modules has to be 
inserted manually... probably a problem with module-init-tools.

	Rudmer
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-14  8:53 2.5.67-mm3 Andrew Morton
  2003-04-14 11:03 ` 2.5.67-mm3 Bill Huey
@ 2003-04-15  2:00 ` William Lee Irwin III
  2003-04-15  4:17   ` 2.5.67-mm3 William Lee Irwin III
  2003-04-16  2:21 ` 2.5.67-mm3 William Lee Irwin III
  2 siblings, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2003-04-15  2:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

On Mon, Apr 14, 2003 at 01:53:13AM -0700, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.67/2.5.67-mm3/
> 
> A bunch of new fixes, and a framebuffer update.  This should work a bit
> better than -mm2.


If one's goal is to free highmem pages, shrink_slab() is an ineffective
method of recovering them, as slab pages are all ZONE_NORMAL or ZONE_DMA.
Hence, this "FIXME: do not do for zone highmem". Presumably this is a
question of policy, as highmem allocations may be satisfied by reaping
slab pages and handing them back; but the FIXME says what we should do.


diff -urpN mm3-2.5.67-1/mm/vmscan.c mm3-2.5.67-2/mm/vmscan.c
--- mm3-2.5.67-1/mm/vmscan.c	2003-04-14 18:08:15.000000000 -0700
+++ mm3-2.5.67-2/mm/vmscan.c	2003-04-14 18:16:41.000000000 -0700
@@ -134,11 +134,9 @@ void remove_shrinker(struct shrinker *sh
  * If the vm encounted mapped pages on the LRU it increase the pressure on
  * slab to avoid swapping.
  *
- * FIXME: do not do for zone highmem
- *
  * We do weird things to avoid (scanned*seeks*entries) overflowing 32 bits.
  */
-static int shrink_slab(long scanned,  unsigned int gfp_mask)
+static int shrink_slab(long scanned, unsigned int gfp_mask)
 {
 	struct shrinker *shrinker;
 	long pages;
@@ -835,7 +833,8 @@ try_to_free_pages(struct zone *classzone
 
 		/* Take a nap, wait for some writeback to complete */
 		blk_congestion_wait(WRITE, HZ/10);
-		shrink_slab(total_scanned, gfp_mask);
+		if (classzone - classzone->zone_pgdat->node_zones < ZONE_HIGHMEM)
+			shrink_slab(total_scanned, gfp_mask);
 	}
 	if (gfp_mask & __GFP_FS)
 		out_of_memory();
@@ -895,7 +894,8 @@ static int balance_pgdat(pg_data_t *pgda
 				max_scan = SWAP_CLUSTER_MAX;
 			to_free -= shrink_zone(zone, max_scan, GFP_KERNEL,
 					to_reclaim, &nr_mapped, ps, priority);
-			shrink_slab(max_scan + nr_mapped, GFP_KERNEL);
+			if (i < ZONE_HIGHMEM)
+				shrink_slab(max_scan + nr_mapped, GFP_KERNEL);
 			if (zone->all_unreclaimable)
 				continue;
 			if (zone->pages_scanned > zone->present_pages * 2)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-15  2:00 ` 2.5.67-mm3 William Lee Irwin III
@ 2003-04-15  4:17   ` William Lee Irwin III
  2003-04-15  4:31     ` 2.5.67-mm3 Andrew Morton
  2003-04-15  5:52     ` 2.5.67-mm3 Antonio Vargas
  0 siblings, 2 replies; 21+ messages in thread
From: William Lee Irwin III @ 2003-04-15  4:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm

On Mon, Apr 14, 2003 at 07:00:57PM -0700, William Lee Irwin III wrote:
> Hence, this "FIXME: do not do for zone highmem". Presumably this is a

Another FIXME patch:


It's a bit of an open question as to how much of a difference this one
makes now, but it says "FIXME". fault_in_pages_writeable() and 
fault_in_pages_readable() have a limited "range" with respect to the
size of the region they can prefault; as they are now, they are only
meant to handle spanning a page boundary. This converts them to iterate
over the virtual address range specified and so touch each virtual page
within it once as specified. As per the comment within the "FIXME",
this is only an issue if PAGE_SIZE < PAGE_CACHE_SIZE.


diff -urpN mm3-2.5.67-3/include/linux/pagemap.h mm3-2.5.67-4/include/linux/pagemap.h
--- mm3-2.5.67-3/include/linux/pagemap.h	2003-04-07 10:30:34.000000000 -0700
+++ mm3-2.5.67-4/include/linux/pagemap.h	2003-04-14 19:10:17.000000000 -0700
@@ -168,46 +168,32 @@ extern void end_page_writeback(struct pa
 
 /*
  * Fault a userspace page into pagetables.  Return non-zero on a fault.
- *
- * FIXME: this assumes that two userspace pages are always sufficient.  That's
- * not true if PAGE_CACHE_SIZE > PAGE_SIZE.
+ * Both walk with stride PAGE_SIZE touching the start of the affected
+ * pages until faulting or at the start of a page beyond size from uaddr.
  */
 static inline int fault_in_pages_writeable(char *uaddr, int size)
 {
-	int ret;
+	int ret = 0;
+	unsigned long addr = (unsigned long)uaddr & PAGE_MASK;
 
 	/*
 	 * Writing zeroes into userspace here is OK, because we know that if
 	 * the zero gets there, we'll be overwriting it.
 	 */
-	ret = __put_user(0, uaddr);
-	if (ret == 0) {
-		char *end = uaddr + size - 1;
-
-		/*
-		 * If the page was already mapped, this will get a cache miss
-		 * for sure, so try to avoid doing it.
-		 */
-		if (((unsigned long)uaddr & PAGE_MASK) !=
-				((unsigned long)end & PAGE_MASK))
-		 	ret = __put_user(0, end);
-	}
+	for (; addr < (unsigned long)uaddr + size && !ret; addr += PAGE_SIZE)
+		ret = __put_user(0, (char *)max(addr, (unsigned long)uaddr));
+
 	return ret;
 }
 
 static inline void fault_in_pages_readable(const char *uaddr, int size)
 {
 	volatile char c;
-	int ret;
+	int ret = 0;
+	unsigned long addr = (unsigned long)uaddr & PAGE_MASK;
 
-	ret = __get_user(c, (char *)uaddr);
-	if (ret == 0) {
-		const char *end = uaddr + size - 1;
-
-		if (((unsigned long)uaddr & PAGE_MASK) !=
-				((unsigned long)end & PAGE_MASK))
-		 	__get_user(c, (char *)end);
-	}
+	for (; addr < (unsigned long)uaddr + size && !ret; addr += PAGE_SIZE)
+		ret = __get_user(c, (char *)addr);
 }
 
 #endif /* _LINUX_PAGEMAP_H */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-15  4:17   ` 2.5.67-mm3 William Lee Irwin III
@ 2003-04-15  4:31     ` Andrew Morton
  2003-04-15  4:39       ` 2.5.67-mm3 William Lee Irwin III
  2003-04-15  5:52     ` 2.5.67-mm3 Antonio Vargas
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2003-04-15  4:31 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel, linux-mm

William Lee Irwin III <wli@holomorphy.com> wrote:
>
> +	for (; addr < (unsigned long)uaddr + size && !ret; addr += PAGE_SIZE)
> +		ret = __put_user(0, (char *)max(addr, (unsigned long)uaddr));

This hurts my brain.  If anything, it should be formulated as a do-while loop.

But I'm not sure we should really bother, because relatively large amounts of
stuff is broken for PAGE_SIZE != PAGE_CACHE_SIZE anyway.  tmpfs comes to
mind...

If page clustering needs to redo this code (and I assume it does) then that
would be an argument in favour.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-15  4:31     ` 2.5.67-mm3 Andrew Morton
@ 2003-04-15  4:39       ` William Lee Irwin III
  2003-04-15  4:55         ` 2.5.67-mm3 Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2003-04-15  4:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

William Lee Irwin III <wli@holomorphy.com> wrote:
>> +	for (; addr < (unsigned long)uaddr + size && !ret; addr += PAGE_SIZE)
>> +		ret = __put_user(0, (char *)max(addr, (unsigned long)uaddr));

On Mon, Apr 14, 2003 at 09:31:14PM -0700, Andrew Morton wrote:
> This hurts my brain.  If anything, it should be formulated as a do-while loop.
> But I'm not sure we should really bother, because relatively large amounts of
> stuff is broken for PAGE_SIZE != PAGE_CACHE_SIZE anyway.  tmpfs comes to
> mind...
> If page clustering needs to redo this code (and I assume it does) then that
> would be an argument in favour.

Page clustering wants something similar but slightly different. The
unit it wants as its stride (MMUPAGE_SIZE) isn't present so this doesn't
really help or hurt it. I believe I actually dodged this bullet by
ensuring (or incorrectly assuming) the callers used sizes <= MMUPAGE_SIZE
and left it either unaltered and suboptimal or (worst-case) buggy.

I'm just going down the list of FIXME's in the VM I turned up by grepping.
Should we do the following instead?


-- wli


diff -urpN mm3-2.5.67-2/include/linux/pagemap.h mm3-2.5.67-2A/include/linux/pagemap.h
--- mm3-2.5.67-2/include/linux/pagemap.h	2003-04-07 10:30:34.000000000 -0700
+++ mm3-2.5.67-2A/include/linux/pagemap.h	2003-04-14 21:24:52.000000000 -0700
@@ -169,7 +169,7 @@ extern void end_page_writeback(struct pa
 /*
  * Fault a userspace page into pagetables.  Return non-zero on a fault.
  *
- * FIXME: this assumes that two userspace pages are always sufficient.  That's
+ * This assumes that two userspace pages are always sufficient.  That's
  * not true if PAGE_CACHE_SIZE > PAGE_SIZE.
  */
 static inline int fault_in_pages_writeable(char *uaddr, int size)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-15  4:39       ` 2.5.67-mm3 William Lee Irwin III
@ 2003-04-15  4:55         ` Andrew Morton
  2003-04-15  5:15           ` 2.5.67-mm3 William Lee Irwin III
  2003-04-15  6:09           ` 2.5.67-mm3 William Lee Irwin III
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2003-04-15  4:55 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel, linux-mm

William Lee Irwin III <wli@holomorphy.com> wrote:
>
> Page clustering wants something similar but slightly different. The
> unit it wants as its stride (MMUPAGE_SIZE) isn't present so this doesn't
> really help or hurt it. I believe I actually dodged this bullet by
> ensuring (or incorrectly assuming) the callers used sizes <= MMUPAGE_SIZE
> and left it either unaltered and suboptimal or (worst-case) buggy.

Callers will use sizes between 1 and PAGE_CACHE_SIZE, with arbitrary
alignment.  So you may need to fault in up to

	(PAGE_CACHE_SIZE / MMUPAGE_SIZE) + 1

pte's.  And up to two PAGE_CACHE_SIZE pages.

Sort-of.  The code is doing two things.

a) Make sure that all the relevant pte's are established in the correct
   state so we don't take a fault while holding the subsequent atomic kmap.

   This is just an optimisation.  If we _do_ take the fault while holding
   an atomic kmap, we fall back to sleeping kmap, and do the whole copy
   again.  It almost never happens.

b) Making sure that the pagecache page is present before we lock it.  This
   is to handle the icky deadlock which occurs when someone is doing a
   write() into a MAP_SHARED region of the file, where the source and dest of
   the copy are the same physical page.  If we take a fault and then try to
   bring the page uptodate in the fault handler we deadlock because the page
   is already locked.

   The fault-by-hand-before-locking-the-page is racy - if the VM steals
   the page again before we lock it (rare), the deadlock can still occur.

   I've been able to trigger the fault which causes fallback to kmap()
   occasionally, under heavy load.  But never the deadlock.

   We don't know how to fix this for real.  I had patch for a while which
   added current->locked_page, and filemap_nopage() would compare that with
   the to-be-locked page and say "ah-hah!" and take avoiding action.

   But then Hugh rudely pointed out that the deadlock was still present if
   two tasks were involved, each trying to fault in the other's locked page.

> I'm just going down the list of FIXME's in the VM I turned up by grepping.
> Should we do the following instead?

OK ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-15  4:55         ` 2.5.67-mm3 Andrew Morton
@ 2003-04-15  5:15           ` William Lee Irwin III
  2003-04-15  5:35             ` 2.5.67-mm3 Andrew Morton
  2003-04-15  6:09           ` 2.5.67-mm3 William Lee Irwin III
  1 sibling, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2003-04-15  5:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

William Lee Irwin III <wli@holomorphy.com> wrote:
> > Page clustering wants something similar but slightly different. The
> > unit it wants as its stride (MMUPAGE_SIZE) isn't present so this doesn't
> > really help or hurt it. I believe I actually dodged this bullet by
> > ensuring (or incorrectly assuming) the callers used sizes <= MMUPAGE_SIZE
> > and left it either unaltered and suboptimal or (worst-case) buggy.

On Mon, Apr 14, 2003 at 09:55:41PM -0700, Andrew Morton wrote:
> Callers will use sizes between 1 and PAGE_CACHE_SIZE, with arbitrary
> alignment.  So you may need to fault in up to
> 	(PAGE_CACHE_SIZE / MMUPAGE_SIZE) + 1
> pte's.  And up to two PAGE_CACHE_SIZE pages.

I checked on what pgcl was doing, and it's buggy. Thanks, I'll fix it
up and keep it rolling for when the time is right to send pieces in
(if ever for 2.6).


On Mon, Apr 14, 2003 at 09:55:41PM -0700, Andrew Morton wrote:
> Sort-of.  The code is doing two things.
> a) Make sure that all the relevant pte's are established in the correct
>    state so we don't take a fault while holding the subsequent atomic kmap.
>    This is just an optimisation.  If we _do_ take the fault while holding
>    an atomic kmap, we fall back to sleeping kmap, and do the whole copy
>    again.  It almost never happens.

This is the easy part; we're basically just prefaulting.


On Mon, Apr 14, 2003 at 09:55:41PM -0700, Andrew Morton wrote:
> b) Making sure that the pagecache page is present before we lock it.  This
>    is to handle the icky deadlock which occurs when someone is doing a
>    write() into a MAP_SHARED region of the file, where the source and dest of
>    the copy are the same physical page.  If we take a fault and then try to
>    bring the page uptodate in the fault handler we deadlock because the page
>    is already locked.
>    The fault-by-hand-before-locking-the-page is racy - if the VM steals
>    the page again before we lock it (rare), the deadlock can still occur.
>    I've been able to trigger the fault which causes fallback to kmap()
>    occasionally, under heavy load.  But never the deadlock.
>    We don't know how to fix this for real.  I had patch for a while which
>    added current->locked_page, and filemap_nopage() would compare that with
>    the to-be-locked page and say "ah-hah!" and take avoiding action.
>    But then Hugh rudely pointed out that the deadlock was still present if
>    two tasks were involved, each trying to fault in the other's locked page.

This sounds more serious. My first thought is address-ordering the
locking, but it's not obvious how to do that with the current control
flow structure (and it sounds oversimplified). On my list of things to
think about.


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-15  5:15           ` 2.5.67-mm3 William Lee Irwin III
@ 2003-04-15  5:35             ` Andrew Morton
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2003-04-15  5:35 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel, linux-mm

William Lee Irwin III <wli@holomorphy.com> wrote:
>
> On Mon, Apr 14, 2003 at 09:55:41PM -0700, Andrew Morton wrote:
> > Sort-of.  The code is doing two things.
> > a) Make sure that all the relevant pte's are established in the correct
> >    state so we don't take a fault while holding the subsequent atomic kmap.
> >    This is just an optimisation.  If we _do_ take the fault while holding
> >    an atomic kmap, we fall back to sleeping kmap, and do the whole copy
> >    again.  It almost never happens.
> 
> This is the easy part; we're basically just prefaulting.

btw, this may sound like a lot of futzing about, but the other day I
timed four concurrent instances of

	dd if=/dev/zero of=$i bs=1 count=1M

on the four-way.  2.5 ran eight times faster than 2.4.  2.4's kmap_lock
contention was astonishing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-15  4:55         ` 2.5.67-mm3 Andrew Morton
  2003-04-15  5:15           ` 2.5.67-mm3 William Lee Irwin III
@ 2003-04-15  6:09           ` William Lee Irwin III
  2003-04-15  6:10             ` 2.5.67-mm3 William Lee Irwin III
  1 sibling, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2003-04-15  6:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

william Lee Irwin III <wli@holomorphy.com> wrote:
>> I'm just going down the list of FIXME's in the VM I turned up by grepping.
>> Should we do the following instead?

On Mon, Apr 14, 2003 at 09:55:41PM -0700, Andrew Morton wrote:
> OK ;)

Okay, these don't get us all the way there, but at least it gets us
closer: there are "FIXME" things associated with intrusions of the
old buffer_cache (as opposed to the new buffer cache a.k.a. pagecache)
into the core VM.

The first is simply:


Move __set_page_dirty_buffers() to fs/buffer.c, as per the FIXME.


diff -urpN mm3-2.5.67-1/fs/buffer.c mm3-2.5.67-2B/fs/buffer.c
--- mm3-2.5.67-1/fs/buffer.c	2003-04-14 18:08:14.000000000 -0700
+++ mm3-2.5.67-2B/fs/buffer.c	2003-04-14 22:21:19.000000000 -0700
@@ -779,6 +779,85 @@ void mark_buffer_dirty_inode(struct buff
 EXPORT_SYMBOL(mark_buffer_dirty_inode);
 
 /*
+ * Add a page to the dirty page list.
+ *
+ * It is a sad fact of life that this function is called from several places
+ * deeply under spinlocking.  It may not sleep.
+ *
+ * If the page has buffers, the uptodate buffers are set dirty, to preserve
+ * dirty-state coherency between the page and the buffers.  It the page does
+ * not have buffers then when they are later attached they will all be set
+ * dirty.
+ *
+ * The buffers are dirtied before the page is dirtied.  There's a small race
+ * window in which a writepage caller may see the page cleanness but not the
+ * buffer dirtiness.  That's fine.  If this code were to set the page dirty
+ * before the buffers, a concurrent writepage caller could clear the page dirty
+ * bit, see a bunch of clean buffers and we'd end up with dirty buffers/clean
+ * page on the dirty page list.
+ *
+ * There is also a small window where the page is dirty, and not on dirty_pages.
+ * Also a possibility that by the time the page is added to dirty_pages, it has
+ * been set clean.  The page lists are somewhat approximate in this regard.
+ * It's better to have clean pages accidentally attached to dirty_pages than to
+ * leave dirty pages attached to clean_pages.
+ *
+ * We use private_lock to lock against try_to_free_buffers while using the
+ * page's buffer list.  Also use this to protect against clean buffers being
+ * added to the page after it was set dirty.
+ *
+ * FIXME: may need to call ->reservepage here as well.  That's rather up to the
+ * address_space though.
+ *
+ * For now, we treat swapper_space specially.  It doesn't use the normal
+ * block a_ops.
+ */
+int __set_page_dirty_buffers(struct page *page)
+{
+	struct address_space * const mapping = page->mapping;
+	int ret = 0;
+
+	if (mapping == NULL) {
+		SetPageDirty(page);
+		goto out;
+	}
+
+	if (!PageUptodate(page))
+		buffer_error();
+
+	spin_lock(&mapping->private_lock);
+	if (page_has_buffers(page)) {
+		struct buffer_head *head = page_buffers(page);
+		struct buffer_head *bh = head;
+
+		do {
+			if (buffer_uptodate(bh))
+				set_buffer_dirty(bh);
+			else
+				buffer_error();
+			bh = bh->b_this_page;
+		} while (bh != head);
+	}
+	spin_unlock(&mapping->private_lock);
+
+	if (!TestSetPageDirty(page)) {
+		spin_lock(&mapping->page_lock);
+		if (page->mapping) {	/* Race with truncate? */
+			if (!mapping->backing_dev_info->memory_backed)
+				inc_page_state(nr_dirty);
+			list_del(&page->list);
+			list_add(&page->list, &mapping->dirty_pages);
+		}
+		spin_unlock(&mapping->page_lock);
+		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
+	}
+	
+out:
+	return ret;
+}
+EXPORT_SYMBOL(__set_page_dirty_buffers);
+
+/*
  * Write out and wait upon a list of buffers.
  *
  * We have conflicting pressures: we want to make sure that all
diff -urpN mm3-2.5.67-1/mm/page-writeback.c mm3-2.5.67-2B/mm/page-writeback.c
--- mm3-2.5.67-1/mm/page-writeback.c	2003-04-14 18:08:15.000000000 -0700
+++ mm3-2.5.67-2B/mm/page-writeback.c	2003-04-14 22:20:23.000000000 -0700
@@ -462,88 +462,6 @@ int write_one_page(struct page *page, in
 EXPORT_SYMBOL(write_one_page);
 
 /*
- * Add a page to the dirty page list.
- *
- * It is a sad fact of life that this function is called from several places
- * deeply under spinlocking.  It may not sleep.
- *
- * If the page has buffers, the uptodate buffers are set dirty, to preserve
- * dirty-state coherency between the page and the buffers.  It the page does
- * not have buffers then when they are later attached they will all be set
- * dirty.
- *
- * The buffers are dirtied before the page is dirtied.  There's a small race
- * window in which a writepage caller may see the page cleanness but not the
- * buffer dirtiness.  That's fine.  If this code were to set the page dirty
- * before the buffers, a concurrent writepage caller could clear the page dirty
- * bit, see a bunch of clean buffers and we'd end up with dirty buffers/clean
- * page on the dirty page list.
- *
- * There is also a small window where the page is dirty, and not on dirty_pages.
- * Also a possibility that by the time the page is added to dirty_pages, it has
- * been set clean.  The page lists are somewhat approximate in this regard.
- * It's better to have clean pages accidentally attached to dirty_pages than to
- * leave dirty pages attached to clean_pages.
- *
- * We use private_lock to lock against try_to_free_buffers while using the
- * page's buffer list.  Also use this to protect against clean buffers being
- * added to the page after it was set dirty.
- *
- * FIXME: may need to call ->reservepage here as well.  That's rather up to the
- * address_space though.
- *
- * For now, we treat swapper_space specially.  It doesn't use the normal
- * block a_ops.
- *
- * FIXME: this should move over to fs/buffer.c - buffer_heads have no business in mm/
- */
-#include <linux/buffer_head.h>
-int __set_page_dirty_buffers(struct page *page)
-{
-	struct address_space * const mapping = page->mapping;
-	int ret = 0;
-
-	if (mapping == NULL) {
-		SetPageDirty(page);
-		goto out;
-	}
-
-	if (!PageUptodate(page))
-		buffer_error();
-
-	spin_lock(&mapping->private_lock);
-	if (page_has_buffers(page)) {
-		struct buffer_head *head = page_buffers(page);
-		struct buffer_head *bh = head;
-
-		do {
-			if (buffer_uptodate(bh))
-				set_buffer_dirty(bh);
-			else
-				buffer_error();
-			bh = bh->b_this_page;
-		} while (bh != head);
-	}
-	spin_unlock(&mapping->private_lock);
-
-	if (!TestSetPageDirty(page)) {
-		spin_lock(&mapping->page_lock);
-		if (page->mapping) {	/* Race with truncate? */
-			if (!mapping->backing_dev_info->memory_backed)
-				inc_page_state(nr_dirty);
-			list_del(&page->list);
-			list_add(&page->list, &mapping->dirty_pages);
-		}
-		spin_unlock(&mapping->page_lock);
-		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
-	}
-	
-out:
-	return ret;
-}
-EXPORT_SYMBOL(__set_page_dirty_buffers);
-
-/*
  * For address_spaces which do not use buffers.  Just set the page's dirty bit
  * and move it to the dirty_pages list.  Also perform space reservation if
  * required.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-15  6:09           ` 2.5.67-mm3 William Lee Irwin III
@ 2003-04-15  6:10             ` William Lee Irwin III
  0 siblings, 0 replies; 21+ messages in thread
From: William Lee Irwin III @ 2003-04-15  6:10 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm

On Mon, Apr 14, 2003 at 11:09:07PM -0700, William Lee Irwin III wrote:
> Okay, these don't get us all the way there, but at least it gets us
> closer: there are "FIXME" things associated with intrusions of the
> old buffer_cache (as opposed to the new buffer cache a.k.a. pagecache)
> into the core VM.
> The first is simply:
> Move __set_page_dirty_buffers() to fs/buffer.c, as per the FIXME.

On top of the __set_page_dirty_buffers() patch:


Remove page_has_buffers() from various functions, document the
dependencies on buffer_head.h from other files besides filemap.c,
and s/this file/core VM/ in filemap.c


diff -urpN mm3-2.5.67-2B/mm/filemap.c mm3-2.5.67-3B/mm/filemap.c
--- mm3-2.5.67-2B/mm/filemap.c	2003-04-14 18:08:15.000000000 -0700
+++ mm3-2.5.67-3B/mm/filemap.c	2003-04-14 22:43:08.000000000 -0700
@@ -31,12 +31,11 @@
  * This is needed for the following functions:
  *  - try_to_release_page
  *  - block_invalidatepage
- *  - page_has_buffers
  *  - generic_osync_inode
  *
- * FIXME: remove all knowledge of the buffer layer from this file
+ * FIXME: remove all knowledge of the buffer layer from the core VM
  */
-#include <linux/buffer_head.h>
+#include <linux/buffer_head.h> /* for generic_osync_inode */
 
 #include <asm/uaccess.h>
 #include <asm/mman.h>
diff -urpN mm3-2.5.67-2B/mm/swap.c mm3-2.5.67-3B/mm/swap.c
--- mm3-2.5.67-2B/mm/swap.c	2003-04-07 10:31:05.000000000 -0700
+++ mm3-2.5.67-3B/mm/swap.c	2003-04-14 22:43:49.000000000 -0700
@@ -21,7 +21,7 @@
 #include <linux/pagevec.h>
 #include <linux/init.h>
 #include <linux/mm_inline.h>
-#include <linux/buffer_head.h>
+#include <linux/buffer_head.h>	/* for try_to_release_page() */
 #include <linux/percpu.h>
 
 /* How many pages do we try to swap or page in/out together? */
diff -urpN mm3-2.5.67-2B/mm/swap_state.c mm3-2.5.67-3B/mm/swap_state.c
--- mm3-2.5.67-2B/mm/swap_state.c	2003-04-14 18:08:15.000000000 -0700
+++ mm3-2.5.67-3B/mm/swap_state.c	2003-04-14 22:28:20.000000000 -0700
@@ -13,7 +13,6 @@
 #include <linux/init.h>
 #include <linux/pagemap.h>
 #include <linux/backing-dev.h>
-#include <linux/buffer_head.h>	/* block_sync_page() */
 
 #include <asm/pgtable.h>
 
@@ -187,7 +186,7 @@ void delete_from_swap_cache(struct page 
 
 	BUG_ON(!PageLocked(page));
 	BUG_ON(PageWriteback(page));
-	BUG_ON(page_has_buffers(page));
+	BUG_ON(PagePrivate(page));
   
 	entry.val = page->index;
 
@@ -236,7 +235,7 @@ int move_from_swap_cache(struct page *pa
 
 	BUG_ON(!PageLocked(page));
 	BUG_ON(PageWriteback(page));
-	BUG_ON(page_has_buffers(page));
+	BUG_ON(PagePrivate(page));
 
 	entry.val = page->index;
 
diff -urpN mm3-2.5.67-2B/mm/swapfile.c mm3-2.5.67-3B/mm/swapfile.c
--- mm3-2.5.67-2B/mm/swapfile.c	2003-04-14 18:08:15.000000000 -0700
+++ mm3-2.5.67-3B/mm/swapfile.c	2003-04-14 22:27:57.000000000 -0700
@@ -15,7 +15,6 @@
 #include <linux/namei.h>
 #include <linux/shm.h>
 #include <linux/blkdev.h>
-#include <linux/buffer_head.h>
 #include <linux/writeback.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
@@ -300,7 +299,7 @@ int remove_exclusive_swap_page(struct pa
 	struct swap_info_struct * p;
 	swp_entry_t entry;
 
-	BUG_ON(page_has_buffers(page));
+	BUG_ON(PagePrivate(page));
 	BUG_ON(!PageLocked(page));
 
 	if (!PageSwapCache(page))
@@ -355,7 +354,7 @@ void free_swap_and_cache(swp_entry_t ent
 	if (page) {
 		int one_user;
 
-		BUG_ON(page_has_buffers(page));
+		BUG_ON(PagePrivate(page));
 		page_cache_get(page);
 		one_user = (page_count(page) == 2);
 		/* Only cache user (+us), or swap space full? Free it! */
diff -urpN mm3-2.5.67-2B/mm/vmscan.c mm3-2.5.67-3B/mm/vmscan.c
--- mm3-2.5.67-2B/mm/vmscan.c	2003-04-14 18:08:15.000000000 -0700
+++ mm3-2.5.67-3B/mm/vmscan.c	2003-04-14 22:45:19.000000000 -0700
@@ -22,7 +22,8 @@
 #include <linux/writeback.h>
 #include <linux/suspend.h>
 #include <linux/blkdev.h>
-#include <linux/buffer_head.h>		/* for try_to_release_page() */
+#include <linux/buffer_head.h>	/* for try_to_release_page(),
+					buffer_heads_over_limit */
 #include <linux/mm_inline.h>
 #include <linux/pagevec.h>
 #include <linux/backing-dev.h>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-15  4:17   ` 2.5.67-mm3 William Lee Irwin III
  2003-04-15  4:31     ` 2.5.67-mm3 Andrew Morton
@ 2003-04-15  5:52     ` Antonio Vargas
  2003-04-15  5:52       ` 2.5.67-mm3 William Lee Irwin III
  1 sibling, 1 reply; 21+ messages in thread
From: Antonio Vargas @ 2003-04-15  5:52 UTC (permalink / raw)
  To: William Lee Irwin III, Andrew Morton, linux-kernel, linux-mm

On Mon, Apr 14, 2003 at 09:17:59PM -0700, William Lee Irwin III wrote:
> On Mon, Apr 14, 2003 at 07:00:57PM -0700, William Lee Irwin III wrote:
> > Hence, this "FIXME: do not do for zone highmem". Presumably this is a
> 
> Another FIXME patch:
> 
> 
> It's a bit of an open question as to how much of a difference this one
> makes now, but it says "FIXME". fault_in_pages_writeable() and 
> fault_in_pages_readable() have a limited "range" with respect to the
> size of the region they can prefault; as they are now, they are only
> meant to handle spanning a page boundary. This converts them to iterate
> over the virtual address range specified and so touch each virtual page
> within it once as specified. As per the comment within the "FIXME",
> this is only an issue if PAGE_SIZE < PAGE_CACHE_SIZE.
> 
> [patch snip]

Page clustering? I did a simple patch yesterday called "cow-ahead", which
may be related: on a write to a COW page, it breaks the COW from several pages
at the same time. The implementation survived a complete debian 2.2 boot
and a fork bomb. Please have a look. The idea came from a discussion with
Martin J. Bligh... we liked the name too much not to implement it.

Greets, Antonio.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-15  5:52     ` 2.5.67-mm3 Antonio Vargas
@ 2003-04-15  5:52       ` William Lee Irwin III
  2003-04-15 15:09         ` 2.5.67-mm3 Antonio Vargas
  0 siblings, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2003-04-15  5:52 UTC (permalink / raw)
  To: Antonio Vargas; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, Apr 14, 2003 at 09:17:59PM -0700, William Lee Irwin III wrote:
>> It's a bit of an open question as to how much of a difference this one
>> makes now, but it says "FIXME". fault_in_pages_writeable() and 
>> fault_in_pages_readable() have a limited "range" with respect to the
>> size of the region they can prefault; as they are now, they are only
>> meant to handle spanning a page boundary. This converts them to iterate
>> over the virtual address range specified and so touch each virtual page
>> within it once as specified. As per the comment within the "FIXME",
>> this is only an issue if PAGE_SIZE < PAGE_CACHE_SIZE.
>> [patch snip]

On Tue, Apr 15, 2003 at 07:52:29AM +0200, Antonio Vargas wrote:
> Page clustering? I did a simple patch yesterday called "cow-ahead", which
> may be related: on a write to a COW page, it breaks the COW from several pages
> at the same time. The implementation survived a complete debian 2.2 boot
> and a fork bomb. Please have a look. The idea came from a discussion with
> Martin J. Bligh... we liked the name too much not to implement it.

I apologize if the name is deceiving, but it's conventional. I saw your
patch and it could very well be valuable, but it would be called
"prefaulting" or "faultahead". Page clustering is divorcing the TLB
mapping unit from the kernel's internal allocation unit, specifically,
enlarging the kernel's allocation unit for reductions in the size of
certain data structures (for PAE, the most important of these is the
mem_map[] array but the pagecache radix trees also see good reductions),
and for physical contiguity benefits in things like io as they are
applicable (it is not applicable to workloads with many small files or
for workloads with predominantly small io sizes).

The article on kerneltrap.org on the subject should have more pointers
to explanatory posts etc. to get a better idea of what's going on.

Also important is to properly credit Hugh Dickins with the original
2.4 implementation of page clustering, which for optimality and
correctness and cleanliness is superior to the current state of my own
for 2.5, and is the source base from which my implementation is derived.

-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-15  5:52       ` 2.5.67-mm3 William Lee Irwin III
@ 2003-04-15 15:09         ` Antonio Vargas
  0 siblings, 0 replies; 21+ messages in thread
From: Antonio Vargas @ 2003-04-15 15:09 UTC (permalink / raw)
  To: William Lee Irwin III, Antonio Vargas, Andrew Morton,
	linux-kernel, linux-mm, mbligh

On Mon, Apr 14, 2003 at 10:52:56PM -0700, William Lee Irwin III wrote:
> On Mon, Apr 14, 2003 at 09:17:59PM -0700, William Lee Irwin III wrote:
> >> It's a bit of an open question as to how much of a difference this one
> >> makes now, but it says "FIXME". fault_in_pages_writeable() and 
> >> fault_in_pages_readable() have a limited "range" with respect to the
> >> size of the region they can prefault; as they are now, they are only
> >> meant to handle spanning a page boundary. This converts them to iterate
> >> over the virtual address range specified and so touch each virtual page
> >> within it once as specified. As per the comment within the "FIXME",
> >> this is only an issue if PAGE_SIZE < PAGE_CACHE_SIZE.
> >> [patch snip]
> 
> On Tue, Apr 15, 2003 at 07:52:29AM +0200, Antonio Vargas wrote:
> > Page clustering? I did a simple patch yesterday called "cow-ahead", which
> > may be related: on a write to a COW page, it breaks the COW from several pages
> > at the same time. The implementation survived a complete debian 2.2 boot
> > and a fork bomb. Please have a look. The idea came from a discussion with
> > Martin J. Bligh... we liked the name too much not to implement it.
> 
> I apologize if the name is deceiving, but it's conventional. I saw your
> patch and it could very well be valuable, but it would be called
> "prefaulting" or "faultahead". Page clustering is divorcing the TLB
> mapping unit from the kernel's internal allocation unit, specifically,
> enlarging the kernel's allocation unit for reductions in the size of
> certain data structures (for PAE, the most important of these is the
> mem_map[] array but the pagecache radix trees also see good reductions),
> and for physical contiguity benefits in things like io as they are
> applicable (it is not applicable to workloads with many small files or
> for workloads with predominantly small io sizes).
> 
> The article on kerneltrap.org on the subject should have more pointers
> to explanatory posts etc. to get a better idea of what's going on.
> 
> Also important is to properly credit Hugh Dickins with the original
> 2.4 implementation of page clustering, which for optimality and
> correctness and cleanliness is superior to the current state of my own
> for 2.5, and is the source base from which my implementation is derived.
> 
> 
> -- wli

I mentioned page clustering instead of page-fault clustering, so it
was my confusion. I recall seeing lkml posts which talked about
"the pgcl patch" and wondered what it was about until I saw
the kerneltrap article. Looks like a nice way to speed up
the system :)

My trusty m68030@50 AmigaOS machine was faster with 32kb blocksize
on the fs for some tasks, dunno about Linux with the much more advanced
memory and file manager with readahead and related improvements.

My patch booted fine on UML, I'll try to boot a physical machine with it
and try to time a kernel compile.

Besides, if you look closer at the patch, you can see it's doing
the "faultahead" only for cow pages, thus giving it the name "cowahead".

Since taking a page fault has to be very expensive on today's machines
due to deep pipelines, I wonder if we should keep a record of where each
vma has last cow-faulted and dinamically adjust the cowahead window
just like the file access manages his readahead window.

Greets, Antonio.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-14  8:53 2.5.67-mm3 Andrew Morton
  2003-04-14 11:03 ` 2.5.67-mm3 Bill Huey
  2003-04-15  2:00 ` 2.5.67-mm3 William Lee Irwin III
@ 2003-04-16  2:21 ` William Lee Irwin III
  2003-04-16  2:40   ` 2.5.67-mm3 William Lee Irwin III
  2 siblings, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2003-04-16  2:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

On Mon, Apr 14, 2003 at 01:53:13AM -0700, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.67/2.5.67-mm3/
> 
> A bunch of new fixes, and a framebuffer update.  This should work a bit
> better than -mm2.

follow_hugetlb_page() behaved improperly if its starting address was
not hugepage-aligned. It looked a bit unclean too, so I rewrote it.
This fixes a bug, and more importantly, makes the thing readable by
something other than a compiler (e.g. programmers).


diff -urpN linux-2.5.67-bk6/arch/i386/mm/hugetlbpage.c htlb-2.5.67-bk6-1/arch/i386/mm/hugetlbpage.c
--- linux-2.5.67-bk6/arch/i386/mm/hugetlbpage.c	2003-04-07 10:32:49.000000000 -0700
+++ htlb-2.5.67-bk6-1/arch/i386/mm/hugetlbpage.c	2003-04-15 18:58:07.000000000 -0700
@@ -129,37 +129,45 @@ nomem:
 int
 follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		    struct page **pages, struct vm_area_struct **vmas,
-		    unsigned long *st, int *length, int i)
+		    unsigned long *position, int *length, int i)
 {
-	pte_t *ptep, pte;
-	unsigned long start = *st;
-	unsigned long pstart;
-	int len = *length;
-	struct page *page;
+	unsigned long vpfn, vaddr = *position;
+	int remainder = *length;
+
+	WARN_ON(!is_vm_hugetlb_page(vma));
 
-	do {
-		pstart = start;
-		ptep = huge_pte_offset(mm, start);
-		pte = *ptep;
+	vpfn = vaddr/PAGE_SIZE;
+	while (vaddr < vma->vm_end && remainder) {
 
-back1:
-		page = pte_page(pte);
 		if (pages) {
-			page += ((start & ~HPAGE_MASK) >> PAGE_SHIFT);
+			pte_t *pte;
+			struct page *page;
+
+			pte = huge_pte_offset(mm, vaddr);
+
+			/* hugetlb should be locked, and hence, prefaulted */
+			WARN_ON(!pte || pte_none(*pte));
+
+			page = &pte_page(*pte)[vpfn % (HPAGE_SIZE/PAGE_SIZE)];
+
+			WARN_ON(!PageCompound(page));
+
 			get_page(page);
 			pages[i] = page;
 		}
+
 		if (vmas)
 			vmas[i] = vma;
-		i++;
-		len--;
-		start += PAGE_SIZE;
-		if (((start & HPAGE_MASK) == pstart) && len &&
-				(start < vma->vm_end))
-			goto back1;
-	} while (len && start < vma->vm_end);
-	*length = len;
-	*st = start;
+
+		vaddr += PAGE_SIZE;
+		++vpfn;
+		--remainder;
+		++i;
+	}
+
+	*length = remainder;
+	*position = vaddr;
+
 	return i;
 }
 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.67-mm3
  2003-04-16  2:21 ` 2.5.67-mm3 William Lee Irwin III
@ 2003-04-16  2:40   ` William Lee Irwin III
  0 siblings, 0 replies; 21+ messages in thread
From: William Lee Irwin III @ 2003-04-16  2:40 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm

On Tue, Apr 15, 2003 at 07:21:54PM -0700, William Lee Irwin III wrote:
> follow_hugetlb_page() behaved improperly if its starting address was
> not hugepage-aligned. It looked a bit unclean too, so I rewrote it.
> This fixes a bug, and more importantly, makes the thing readable by
> something other than a compiler (e.g. programmers).

And this one fixes an overflow when there is more than 4GB of hugetlb:


diff -urpN htlb-2.5.67-bk6-1/arch/i386/mm/hugetlbpage.c htlb-2.5.67-bk6-2/arch/i386/mm/hugetlbpage.c
--- htlb-2.5.67-bk6-1/arch/i386/mm/hugetlbpage.c	2003-04-15 18:58:07.000000000 -0700
+++ htlb-2.5.67-bk6-2/arch/i386/mm/hugetlbpage.c	2003-04-15 19:25:30.000000000 -0700
@@ -482,9 +482,7 @@ int hugetlb_report_meminfo(char *buf)
 
 int is_hugepage_mem_enough(size_t size)
 {
-	if (size > (htlbpagemem << HPAGE_SHIFT))
-		return 0;
-	return 1;
+	return (size + ~HPAGE_MASK)/HPAGE_SIZE <= htlbpagemem;
 }
 
 /*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2003-04-16  2:40 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-14  8:53 2.5.67-mm3 Andrew Morton
2003-04-14 11:03 ` 2.5.67-mm3 Bill Huey
2003-04-14 15:13   ` 2.5.67-mm3 Rudmer van Dijk
2003-04-15  1:03     ` 2.5.67-mm3 Bill Huey
2003-04-15  1:13       ` 2.5.67-mm3 Andrew Morton
2003-04-15  1:34         ` 2.5.67-mm3 Bill Huey
2003-04-15  9:38           ` 2.5.67-mm3 Rudmer van Dijk
2003-04-15  2:00 ` 2.5.67-mm3 William Lee Irwin III
2003-04-15  4:17   ` 2.5.67-mm3 William Lee Irwin III
2003-04-15  4:31     ` 2.5.67-mm3 Andrew Morton
2003-04-15  4:39       ` 2.5.67-mm3 William Lee Irwin III
2003-04-15  4:55         ` 2.5.67-mm3 Andrew Morton
2003-04-15  5:15           ` 2.5.67-mm3 William Lee Irwin III
2003-04-15  5:35             ` 2.5.67-mm3 Andrew Morton
2003-04-15  6:09           ` 2.5.67-mm3 William Lee Irwin III
2003-04-15  6:10             ` 2.5.67-mm3 William Lee Irwin III
2003-04-15  5:52     ` 2.5.67-mm3 Antonio Vargas
2003-04-15  5:52       ` 2.5.67-mm3 William Lee Irwin III
2003-04-15 15:09         ` 2.5.67-mm3 Antonio Vargas
2003-04-16  2:21 ` 2.5.67-mm3 William Lee Irwin III
2003-04-16  2:40   ` 2.5.67-mm3 William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox