linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* 2.5.34-mm4
@ 2002-09-14  4:06 Andrew Morton
  2002-09-14  4:01 ` 2.5.34-mm4 Rik van Riel
  2002-09-15 10:50 ` 2.5.34-mm4 Axel Siebenwirth
  0 siblings, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2002-09-14  4:06 UTC (permalink / raw)
  To: lkml, linux-mm, lse-tech

url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/

Some additional work has been performed on the new, faster
sleep/wakeup facilities.

I have converted TCP/IPV4 over to use the faster wakeups.  It would
be appreciated if the people who are interested in (and set up for
testing) high performance networking could test this out.  Note
however that there is no benefit to select()/poll().  That's quite
a large change.

So please bear in mind that this code will only help if applications
are generally sleeping in accept(), connect(), etc.  At this stage
I'd like to know whether this work is generally something which should be
pursued further - let's be careful that the measurements are not
swamped by select()/poll() wakeups.

The individual patches are:

http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/broken-out/wake-speedup.patch
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/broken-out/tcp-wakeups.patch

These apply against 2.5.26 and possibly earlier, and testing against
earlier kernels would be valid.  Thanks.



Changes have been made to /proc/stat which break top(1) and vmstat(1).
New versions are available at
http://www.zip.com.au/~akpm/linux/patches/procps-2.5.34-mm4.tar.gz
and newer versions will appear at
http://surriel.com/procps/

+aio-sync-iocb.patch

 Ben's AIO patch conflicted with the readv/writev patch.  This is
 Ben's patch reworked to fit on top of readv-writev.patch

+pagevec_lru_add.patch

 Fix a bogon which broke reiserfs4

+taka-writev.patch

 Hirokazu Takahashi's writev() speedup.

+vm-wakeups.patch

 Use the auto waitqueues in the VM and block layers.  Broken out of
 the wake-speedup patch.

+per-node-kswapd.patch

 David Hansen's per-NUMA-node kswapd patch.

+topology-api.patch

 Matthew Dobson's topology API.

+kswapd-reclaim-stats.patch

 Add `kswapd_steal' and `pgrefill' to /proc/vmstat.  The former indicates
 that, on a quick test, 99% of page reclaim is being performed by kswapd.

+iowait.patch

 Instrumentation to show how much time is spent in disk wait.  (Doesn't
 appear to come out in the new top(1) though?)

+tcp-wakeups.patch

 Use auto-waitqueues in TCP/IPV4




linus.patch
  cset-1.568.19.4-to-1.661.txt.gz

scsi_hack.patch
  Fix block-highmem for scsi

ext3-htree.patch
  Indexed directories for ext3

spin-lock-check.patch
  spinlock/rwlock checking infrastructure

rd-cleanup.patch
  Cleanup and fix the ramdisk driver (doesn't work right yet)

readv-writev.patch
  O_DIRECT support for readv/writev

aio-sync-iocb.patch
  Use a sync iocb for generic_file_read

llzpr.patch
  Reduce scheduling latency across zap_page_range

buffermem.patch
  Resurrect buffermem accounting

lpp.patch
  ia32 huge tlb pages

lpp-update.patch
  hugetlbpage fixes

reversemaps-leak.patch
  Fix reverse map accounting leak

sharedmem.patch
  Add /proc/meminfo:Mapped - tha amount of memory which is mapped into pagetables

ext3-sb.patch
  u.ext3_sb -> generic_sbp

pagevec_lru_add.patch
  Run readpage before dropping the page refcount

oom-fix.patch
  Fix an OOM condition on big highmem machines

tlb-cleanup.patch
  Clean up the tlb gather code

dump-stack.patch
  arch-neutral dump_stack() function

wli-cleanup.patch
  random cleanups

madvise-move.patch
  move mdavise implementation into mm/madvise.c

split-vma.patch
  VMA splitting patch

mmap-fixes.patch
  mmap.c cleanup and lock ranking fixes

buffer-ops-move.patch
  Move submit_bh() and ll_rw_block() into fs/buffer.c

slab-stats.patch
  Display total slab memory in /proc/meminfo

writeback-control.patch
  Cleanup and extension of the writeback paths

free_area_init-cleanup.patch
  free_area_init() code cleanup

alloc_pages-cleanup.patch
  alloc_pages cleanup and optimisation

statm_pgd_range-sucks.patch
  Remove the pagetable walk from /proc/stat

remove-sync_thresh.patch
  Remove /proc/sys/vm/dirty_sync_thresh

taka-writev.patch
  Speed up writev

pf_nowarn.patch
  Fix up the handling of PF_NOWARN

jeremy.patch
  Spel Jermy's naim wright

queue-congestion.patch
  Infrastructure for communicating request queue congestion to the VM

nonblocking-ext2-preread.patch
  avoid ext2 inode prereads if the queue is congested

nonblocking-pdflush.patch
  non-blocking writeback infrastructure, use it for pdflush

nonblocking-vm.patch
  Non-blocking page reclaim

wake-speedup.patch
  Faster wakeup code

vm-wakeups.patch
  Use the faster wakeups in the VM and block layers

sync-helper.patch
  Speed up sys_sync() against multiple spindles

slabasap.patch
  Early and smarter shrinking of slabs

write-deadlock.patch
  Fix the generic_file_write-from-same-mmapped-page deadlock

buddyinfo.patch
  Add /proc/buddyinfo - stats on the free pages pool

free_area.patch
  Remove struct free_area_struct and free_area_t, use `struct free_area'

per-node-kswapd.patch
  Per-node kswapd instance

topology-api.patch
  NUMA topology API

radix_tree_gang_lookup.patch
  radix tree gang lookup

truncate_inode_pages.patch
  truncate/invalidate_inode_pages rewrite

proc_vmstat.patch
  Move the vm accounting out of /proc/stat

kswapd-reclaim-stats.patch
  Add kswapd_steal to /proc/vmstat

iowait.patch
  I/O wait statistics

tcp-wakeups.patch
  Use fast wakeups in TCP/IPV4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread
* [PATCH](1/2) rmap14 for ac  (was: Re: 2.5.34-mm4)
@ 2002-09-16  2:47 Rik van Riel
  0 siblings, 0 replies; 19+ messages in thread
From: Rik van Riel @ 2002-09-16  2:47 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-mm, Kernel Mailing List

On 16 Sep 2002, Alan Cox wrote:

> So send me rmap-14a patches by all means

And here is the patch that takes you from rmap14 to
rmap14a + small bugfixes.  Don't be fooled by the
bitkeeper changelog, there was a one-line (whitespace)
reject I had to fix ;)

please apply,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/		http://distro.conectiva.com/
Spamtraps of the month:  september@surriel.com trac@trac.org


# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           mm/rmap.c	1.7     -> 1.8
#	include/linux/mmzone.h	1.12    -> 1.13
#	     mm/page_alloc.c	1.51    -> 1.53
#	         mm/vmscan.c	1.72    -> 1.77
#	        mm/filemap.c	1.71    -> 1.72
#	         mm/memory.c	1.55    -> 1.56
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/08/15	riel@imladris.surriel.com	1.677
# rmap 14 release
# --------------------------------------------
# 02/08/16	riel@imladris.surriel.com	1.678
# low latency zap_page_range also without preempt
# --------------------------------------------
# 02/08/16	riel@imladris.surriel.com	1.679
# remove unneeded pte_chain_unlock/lock pair from vmscan.c
# --------------------------------------------
# 02/08/16	riel@imladris.surriel.com	1.680
# semicolon day, fix typo in rmap.c w/ DEBUG_RMAP
# --------------------------------------------
# 02/08/17	riel@imladris.surriel.com	1.681
# fix smp lock in page_launder_zone (Arjan van de Ven)
# --------------------------------------------
# 02/08/17	riel@imladris.surriel.com	1.682
# Ingo Molnar's per-cpu pages
# --------------------------------------------
# 02/08/18	riel@imladris.surriel.com	1.683
# another (minor) smp fix for page_launder
# --------------------------------------------
# 02/08/18	riel@imladris.surriel.com	1.684
# - throughput tuning for page_launder
# - don't allocate swap space for pages we're not writing
# --------------------------------------------
# 02/08/18	riel@imladris.surriel.com	1.685
# rmap 14a
# --------------------------------------------
# 02/08/19	riel@imladris.surriel.com	1.686
# page_launder bug found by Andrew Morton
# --------------------------------------------
# 02/08/19	riel@imladris.surriel.com	1.687
# clean up mark_page_accessed
# --------------------------------------------
# 02/08/30	riel@imladris.surriel.com	1.688
# Alpha NUMA fix for Ingo's per-cpu pages
# --------------------------------------------
#
--- linux-2.4.20-pre5-ac6-rmap14/mm/filemap.c.orig	2002-09-15 23:39:25.000000000 -0300
+++ linux-2.4.20-pre5-ac6-rmap14/mm/filemap.c	2002-09-15 23:43:42.000000000 -0300
@@ -1383,20 +1383,17 @@
  */
 void mark_page_accessed(struct page *page)
 {
-	if (PageInactiveClean(page)) {
+	/* Mark the page referenced, AFTER checking for previous usage.. */
+	SetPageReferenced(page);
+
+	if (unlikely(PageInactiveClean(page))) {
 		struct zone_struct *zone = page_zone(page);
 		int free = zone->free_pages + zone->inactive_clean_pages;

 		activate_page(page);
 		if (free < zone->pages_low)
 			wakeup_kswapd(GFP_NOIO);
-		if (zone->free_pages < zone->pages_min)
-			fixup_freespace(zone, 1);
-		return;
 	}
-
-	/* Mark the page referenced, AFTER checking for previous usage.. */
-	SetPageReferenced(page);
 }

 /*
--- linux-2.4.20-pre5-ac6-rmap14/mm/memory.c.orig	2002-09-15 23:39:25.000000000 -0300
+++ linux-2.4.20-pre5-ac6-rmap14/mm/memory.c	2002-09-15 23:43:05.000000000 -0300
@@ -436,6 +436,9 @@

 		spin_unlock(&mm->page_table_lock);

+		if (current->need_resched)
+			schedule();
+
 		address += block;
 		size -= block;
 	}
--- linux-2.4.20-pre5-ac6-rmap14/mm/page_alloc.c.orig	2002-09-15 23:38:38.000000000 -0300
+++ linux-2.4.20-pre5-ac6-rmap14/mm/page_alloc.c	2002-09-15 23:43:05.000000000 -0300
@@ -10,6 +10,7 @@
  *  Reshaped it to be a zoned allocator, Ingo Molnar, Red Hat, 1999
  *  Discontiguous memory support, Kanoj Sarcar, SGI, Nov 1999
  *  Zone balancing, Kanoj Sarcar, SGI, Jan 2000
+ *  Per-CPU page pool, Ingo Molnar, Red Hat, 2001, 2002
  */

 #include <linux/config.h>
@@ -22,6 +23,7 @@
 #include <linux/slab.h>
 #include <linux/module.h>
 #include <linux/mm_inline.h>
+#include <linux/smp.h>

 int nr_swap_pages;
 int nr_active_pages;
@@ -85,6 +87,7 @@
 	unsigned long index, page_idx, mask, flags;
 	free_area_t *area;
 	struct page *base;
+	per_cpu_t *per_cpu;
 	zone_t *zone;

 	/* Yes, think what happens when other parts of the kernel take
@@ -93,6 +96,13 @@
 	if (PageLRU(page))
 		lru_cache_del(page);

+	/*
+	 * This late check is safe because reserved pages do not
+	 * have a valid page->count. This trick avoids overhead
+	 * in __free_pages().
+	 */
+	if (PageReserved(page))
+		return;
 	if (page->buffers)
 		BUG();
 	if (page->mapping) {
@@ -129,7 +139,18 @@

 	area = zone->free_area + order;

-	spin_lock_irqsave(&zone->lock, flags);
+	per_cpu = zone->cpu_pages + smp_processor_id();
+
+	__save_flags(flags);
+	__cli();
+	if (!order && (per_cpu->nr_pages < per_cpu->max_nr_pages) && (free_high(zone) <= 0)) {
+		list_add(&page->list, &per_cpu->head);
+		per_cpu->nr_pages++;
+		__restore_flags(flags);
+		return;
+	}
+
+	spin_lock(&zone->lock);

 	zone->free_pages -= mask;

@@ -193,13 +214,32 @@
 static FASTCALL(struct page * rmqueue(zone_t *zone, unsigned int order));
 static struct page * rmqueue(zone_t *zone, unsigned int order)
 {
+	per_cpu_t *per_cpu = zone->cpu_pages + smp_processor_id();
 	free_area_t * area = zone->free_area + order;
 	unsigned int curr_order = order;
 	struct list_head *head, *curr;
 	unsigned long flags;
 	struct page *page;
+	int threshold = 0;
+
+	if (!(current->flags & PF_MEMALLOC))
+		 threshold = (per_cpu->max_nr_pages / 8);
+	__save_flags(flags);
+	__cli();

-	spin_lock_irqsave(&zone->lock, flags);
+	if (!order && (per_cpu->nr_pages>threshold)) {
+		if (unlikely(list_empty(&per_cpu->head)))
+			BUG();
+		page = list_entry(per_cpu->head.next, struct page, list);
+		list_del(&page->list);
+		per_cpu->nr_pages--;
+		__restore_flags(flags);
+
+		set_page_count(page, 1);
+		return page;
+	}
+
+ 	spin_lock(&zone->lock);
 	do {
 		head = &area->free_list;
 		curr = head->next;
@@ -596,7 +636,7 @@

 void __free_pages(struct page *page, unsigned int order)
 {
-	if (!PageReserved(page) && put_page_testzero(page))
+	if (put_page_testzero(page))
 		__free_pages_ok(page, order);
 }

@@ -879,6 +919,7 @@

 	offset = lmem_map - mem_map;
 	for (j = 0; j < MAX_NR_ZONES; j++) {
+		int k;
 		zone_t *zone = pgdat->node_zones + j;
 		unsigned long mask, extrafree = 0;
 		unsigned long size, realsize;
@@ -891,6 +932,18 @@
 		printk("zone(%lu): %lu pages.\n", j, size);
 		zone->size = size;
 		zone->name = zone_names[j];
+
+		for (k = 0; k < NR_CPUS; k++) {
+			per_cpu_t *per_cpu = zone->cpu_pages + k;
+
+			INIT_LIST_HEAD(&per_cpu->head);
+			per_cpu->nr_pages = 0;
+			per_cpu->max_nr_pages = realsize / smp_num_cpus / 128;
+			if (per_cpu->max_nr_pages > MAX_PER_CPU_PAGES)
+				per_cpu->max_nr_pages = MAX_PER_CPU_PAGES;
+			else if (!per_cpu->max_nr_pages)
+				per_cpu->max_nr_pages = 1;
+		}
 		zone->lock = SPIN_LOCK_UNLOCKED;
 		zone->zone_pgdat = pgdat;
 		zone->free_pages = 0;
--- linux-2.4.20-pre5-ac6-rmap14/mm/rmap.c.orig	2002-09-15 23:38:38.000000000 -0300
+++ linux-2.4.20-pre5-ac6-rmap14/mm/rmap.c	2002-09-15 23:43:05.000000000 -0300
@@ -95,7 +95,7 @@
 		BUG();
 	if (!pte_present(*ptep))
 		BUG();
-	if (!ptep_to_mm(ptep));
+	if (!ptep_to_mm(ptep))
 		BUG();
 #endif

--- linux-2.4.20-pre5-ac6-rmap14/mm/vmscan.c.orig	2002-09-15 23:39:25.000000000 -0300
+++ linux-2.4.20-pre5-ac6-rmap14/mm/vmscan.c	2002-09-15 23:43:05.000000000 -0300
@@ -214,7 +214,7 @@
 	int maxscan, cleaned_pages, target, maxlaunder, iopages;
 	struct list_head * entry, * next;

-	target = free_plenty(zone);
+	target = max(free_plenty(zone), zone->pages_min);
 	cleaned_pages = iopages = 0;

 	/* If we can get away with it, only flush 2 MB worth of dirty pages */
@@ -222,12 +222,12 @@
 		maxlaunder = 1000000;
 	else {
 		maxlaunder = min_t(int, 512, zone->inactive_dirty_pages / 4);
-		maxlaunder = max(maxlaunder, free_plenty(zone));
+		maxlaunder = max(maxlaunder, free_plenty(zone) * 4);
 	}

 	/* The main launder loop. */
-rescan:
 	spin_lock(&pagemap_lru_lock);
+rescan:
 	maxscan = zone->inactive_dirty_pages;
 	entry = zone->inactive_dirty_list.prev;
 	next = entry->prev;
@@ -295,7 +295,6 @@
 			UnlockPage(page);
 			continue;
 		}
-		pte_chain_unlock(page);

 		/*
 		 * Anonymous process memory without backing store. Try to
@@ -303,8 +302,15 @@
 		 *
 		 * XXX: implement swap clustering ?
 		 */
-		pte_chain_lock(page);
 		if (page->pte_chain && !page->mapping && !page->buffers) {
+			/* Don't bother if we can't swap it out now. */
+			if (maxlaunder < 0) {
+				pte_chain_unlock(page);
+				UnlockPage(page);
+				list_del(entry);
+				list_add(entry, &zone->inactive_dirty_list);
+				continue;
+			}
 			page_cache_get(page);
 			pte_chain_unlock(page);
 			spin_unlock(&pagemap_lru_lock);
@@ -324,7 +330,7 @@
 		 * The page is mapped into the page tables of one or more
 		 * processes. Try to unmap it here.
 		 */
-		if (page->pte_chain) {
+		if (page->pte_chain && page->mapping) {
 			switch (try_to_unmap(page)) {
 				case SWAP_ERROR:
 				case SWAP_FAIL:
@@ -377,11 +383,11 @@
 		 * the page as well.
 		 */
 		if (page->buffers) {
-			spin_unlock(&pagemap_lru_lock);
-
 			/* To avoid freeing our page before we're done. */
 			page_cache_get(page);

+			spin_unlock(&pagemap_lru_lock);
+
 			if (try_to_release_page(page, gfp_mask)) {
 				if (!page->mapping) {
 					/*
@@ -405,10 +411,19 @@
 					 * slept; undo the stuff we did before
 					 * try_to_release_page and fall through
 					 * to the next step.
+					 * But only if the page is still on the inact. dirty
+					 * list.
 					 */
-					page_cache_release(page);

 					spin_lock(&pagemap_lru_lock);
+					/* Check if the page was removed from the list
+					 * while we looked the other way.
+					 */
+					if (!PageInactiveDirty(page)) {
+						page_cache_release(page);
+						continue;
+					}
+					page_cache_release(page);
 				}
 			} else {
 				/* failed to drop the buffers so stop here */
@@ -478,8 +493,10 @@

 	/* Clean up the remaining zones with a serious shortage, if any. */
 	for_each_zone(zone)
-		if (free_min(zone) >= 0)
-			freed += page_launder_zone(zone, gfp_mask, 1);
+		if (free_low(zone) >= 0) {
+			int fullflush = free_min(zone) > 0;
+			freed += page_launder_zone(zone, gfp_mask, fullflush);
+		}

 	return freed;
 }
--- linux-2.4.20-pre5-ac6-rmap14/include/linux/mmzone.h.orig	2002-09-15 23:39:25.000000000 -0300
+++ linux-2.4.20-pre5-ac6-rmap14/include/linux/mmzone.h	2002-09-15 23:43:04.000000000 -0300
@@ -27,6 +27,12 @@
 struct pglist_data;
 struct pte_chain;

+#define MAX_PER_CPU_PAGES 512
+typedef struct per_cpu_pages_s {
+	int			nr_pages, max_nr_pages;
+	struct list_head	head;
+} __attribute__((aligned(L1_CACHE_BYTES))) per_cpu_t;
+
 /*
  * On machines where it is needed (eg PCs) we divide physical memory
  * into multiple physical zones. On a PC we have 3 zones:
@@ -39,6 +45,7 @@
 	/*
 	 * Commonly accessed fields:
 	 */
+	per_cpu_t		cpu_pages[NR_CPUS];
 	spinlock_t		lock;
 	unsigned long		free_pages;
 	unsigned long		active_pages;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2002-09-19  9:01 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-14  4:06 2.5.34-mm4 Andrew Morton
2002-09-14  4:01 ` 2.5.34-mm4 Rik van Riel
2002-09-15 10:50 ` 2.5.34-mm4 Axel Siebenwirth
2002-09-15 14:31   ` 2.5.34-mm4 Rik van Riel
2002-09-16 18:33     ` 2.5.34-mm4 Bill Davidsen
2002-09-15 17:41   ` 2.5.34-mm4 Andrew Morton
2002-09-15 17:36     ` 2.5.34-mm4 Rik van Riel
2002-09-15 17:39     ` 2.5.34-mm4 Rik van Riel
2002-09-15 17:49       ` 2.5.34-mm4 M. Edward Borasky
2002-09-15 17:54         ` 2.5.34-mm4 Rik van Riel
2002-09-15 18:55           ` 2.5.34-mm4 Andrew Morton
2002-09-15 18:56             ` 2.5.34-mm4 Rik van Riel
2002-09-16  1:33               ` 2.5.34-mm4 Alan Cox
2002-09-16  2:32                 ` [PATCH](1/2) rmap14 for ac (was: Re: 2.5.34-mm4) Rik van Riel
2002-09-15 19:10             ` [Lse-tech] Re: 2.5.34-mm4 Andi Kleen
2002-09-16 18:51               ` Bill Davidsen
2002-09-19  9:01                 ` Jens Axboe
2002-09-16 18:48             ` 2.5.34-mm4 Bill Davidsen
2002-09-16  2:47 [PATCH](1/2) rmap14 for ac (was: Re: 2.5.34-mm4) Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox