Re: [patch] arca-vm-2.2.5

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Doug Ledford <dledford@redhat.com>
To: Andrea Arcangeli <andrea@e-mind.com>
Cc: Chuck Lever <cel@monkey.org>,
	linux-kernel@vger.rutgers.edu, linux-mm@kvack.org
Subject: Re: [patch] arca-vm-2.2.5
Date: Mon, 05 Apr 1999 22:14:26 -0400	[thread overview]
Message-ID: <37096E02.C9E53CE2@redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.05.9904060128120.447-100000@laser.random>

[-- Attachment #1: Type: text/plain, Size: 2895 bytes --]

Andrea Arcangeli wrote:
> 
> On Mon, 5 Apr 1999, Chuck Lever wrote:
> 
> >buckets out of 32K buckets have hundreds of buffers).  i have a new hash
> >function that works very well, and even helps inter-run variance and
> >perceived interactive response.  i'll post more on this soon.
> 
> Cool! ;)) But could you tell me _how_ do you design an hash function? Are
> you doing math or do you use instinct? I never gone into the details of
> benchmarking or designing an hash function simply because I don't like
> fuzzy hash and instead I want to replace them with RB-trees all over the
> place (and I know the math about RB-trees), but I like to learn how to
> design an hash function anyway (even if I don't want to discover that
> myself without reading docs as usual ;).
> 
> >but also the page hash function uses the hash table size as a shift value
> >when computing the index, so it may combine the interesting bits in a
> >different (worse) way when you change the hash table size.  i'm planning
> >to instrument the page hash to see exactly what's going on.
> 
> Agreed. This is true. I thought about that and I am resizing the hash
> table size to the original 11 bit now (since you are confirming that I
> broken the hash function).
> 
> >IMHO, i'd think if the buffer cache has a large hash table, you'd want the
> >page cache to have a table as large or larger.  both track roughly the
> >same number of objects...  and the page cache is probably used more often
> 
> Agreed.

Hmmm...I've talked about this a few times to Alan Cox and Stephen
Tweedie.  I didn't bother to instrument the hash function because in
this case I knew it was tuned to the size of the inode structs.  But, I
did implement a variable sized page cache hash table array.  I did this
because at 512MB of RAM, having only 2048 hash buckets (11 bits) became
a real bottleneck in page cache hash entry lookups.  Essentially, when
the number of physical pages per hash buckets gets too large, then hash
chains can grow excessively long and take entirely too long to look up
when the page is not in the page cache (or when it is but it's at the
end of a long chain).  Since these lookups occur as frequently as they
do, they get to be excessive quickly.  I'm attaching the patch I've been
using here for a while.  Feel free to try it out.  One of the best tests
for the effect of this patch is a bonnie run on a machine with more disk
speed than CPU power.  In that case the page cache lookup becomes a
bottleneck.  In other conditions, you see the same disk speed results
but with smaller % CPU utilization.  On a PII400 dual machine with 512MB
RAM and a 4 Cheetah RAID0 array it made a 10MByte/s difference in
throughput.  On another machine it made a roughly 6 to 8% difference in
CPU utilization.


-- 
  Doug Ledford   <dledford@redhat.com>
   Opinions expressed are my own, but
      they should be everybody's.

[-- Attachment #2: page-2.2.2.patch --]
[-- Type: application/octet-stream, Size: 3509 bytes --]

--- linux/include/linux/pagemap.h.page	Thu Jan 28 17:43:17 1999
+++ linux/include/linux/pagemap.h	Mon Mar  8 07:37:54 1999
@@ -17,13 +17,13 @@
 	return PAGE_OFFSET + PAGE_SIZE * (page - mem_map);
 }
 
-#define PAGE_HASH_BITS 11
-#define PAGE_HASH_SIZE (1 << PAGE_HASH_BITS)
-
 #define PAGE_AGE_VALUE 16
-
 extern unsigned long page_cache_size; /* # of pages currently in the hash table */
-extern struct page * page_hash_table[PAGE_HASH_SIZE];
+extern struct page ** page_hash_table;
+extern unsigned long page_hash_size;
+extern unsigned long page_hash_mask;
+extern unsigned long page_hash_bits;
+extern long page_cache_init(long, long);
 
 /*
  * We use a power-of-two hash table to avoid a modulus,
@@ -35,8 +35,8 @@
 {
 #define i (((unsigned long) inode)/(sizeof(struct inode) & ~ (sizeof(struct inode) - 1)))
 #define o (offset >> PAGE_SHIFT)
-#define s(x) ((x)+((x)>>PAGE_HASH_BITS))
-	return s(i+o) & (PAGE_HASH_SIZE-1);
+#define s(x) ((x)+((x)>>page_hash_bits))
+	return s(i+o) & page_hash_mask;
 #undef i
 #undef o
 #undef s
--- linux/init/main.c.page	Mon Feb 22 22:44:53 1999
+++ linux/init/main.c	Mon Mar  8 07:37:54 1999
@@ -19,6 +19,7 @@
 #include <linux/utsname.h>
 #include <linux/ioport.h>
 #include <linux/init.h>
+#include <linux/pagemap.h>
 #include <linux/smp_lock.h>
 #include <linux/blk.h>
 #include <linux/hdreg.h>
@@ -1149,6 +1150,7 @@
 		initrd_start = 0;
 	}
 #endif
+	memory_start = page_cache_init(memory_start, memory_end);
 	mem_init(memory_start,memory_end);
 	kmem_cache_sizes_init();
 #ifdef CONFIG_PROC_FS
--- linux/mm/filemap.c.page	Mon Feb 22 22:44:53 1999
+++ linux/mm/filemap.c	Mon Mar  8 07:37:54 1999
@@ -13,6 +13,7 @@
 #include <linux/shm.h>
 #include <linux/mman.h>
 #include <linux/locks.h>
+#include <linux/init.h>
 #include <linux/pagemap.h>
 #include <linux/swap.h>
 #include <linux/smp_lock.h>
@@ -23,6 +24,7 @@
 
 #include <asm/pgtable.h>
 #include <asm/uaccess.h>
+#include <asm/io.h>
 
 /*
  * Shared mappings implemented 30.11.1994. It's not fully working yet,
@@ -32,7 +34,45 @@
  */
 
 unsigned long page_cache_size = 0;
-struct page * page_hash_table[PAGE_HASH_SIZE];
+unsigned long page_hash_size = 0;
+unsigned long page_hash_bits = 0;
+unsigned long page_hash_mask = 0;
+struct page ** page_hash_table = NULL;
+
+/*
+ * The init function that starts off our page cache
+ */
+long __init page_cache_init( long mem_begin, long mem_end )
+{
+	long cache_begin;
+	long i;
+
+	for(i=0; i < sizeof(long) * 8; i++)
+		if(virt_to_phys((void *)mem_end) & (1<<i))
+			page_hash_bits = i;
+	page_hash_bits -= (PAGE_SHIFT + 2);
+	if(page_hash_bits < 10)
+		page_hash_bits = 10;
+	cache_begin = (mem_begin + (PAGE_SIZE - 1)) & ~PAGE_SIZE;
+	page_hash_size = 1 << page_hash_bits;
+	page_hash_mask = page_hash_size - 1;
+	if ( ((virt_to_phys((void *)cache_begin) <= 0xfffff) && 
+	      (virt_to_phys((void *)cache_begin) >= (0x9f000 - page_hash_size))) )
+	{
+		/*
+		 * Our structure would fall into the middle of the upper 
+		 * BIOS area of the computer, go above it to the 1MB mark
+		 * and start from there
+		 */
+		cache_begin = (long)phys_to_virt(0x100000);
+	}
+	page_hash_table = (struct page **)cache_begin;
+	memset(page_hash_table, 0, page_hash_size * sizeof(void));
+	printk(KERN_INFO "page_cache_init: allocated %ldk of RAM with %ld hash "
+		"buckets.\n", (page_hash_size * sizeof(void *)) / 1024,
+		page_hash_size);
+	return(cache_begin + (page_hash_size * sizeof(void *)));
+}
 
 /*
  * Simple routines for both non-shared and shared mappings.

next prev parent reply	other threads:[~1999-04-06  2:15 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1999-04-01 23:32 Andrea Arcangeli
1999-04-04 21:07 ` Chuck Lever
1999-04-05  0:22   ` Andrea Arcangeli
1999-04-05 13:23     ` Mark Hemment
1999-04-05 15:56       ` Andrea Arcangeli
1999-04-07 11:28         ` [patch] only-one-cache-query [was Re: [patch] arca-vm-2.2.5] Andrea Arcangeli
1999-04-07 13:06           ` Stephen C. Tweedie
1999-04-07 13:49             ` Andrea Arcangeli
1999-04-07 13:42           ` Andrea Arcangeli
1999-04-07 13:47           ` Ingo Molnar
1999-04-07 14:08             ` Andrea Arcangeli
1999-04-05 20:24       ` [patch] arca-vm-2.2.5 Horst von Brand
1999-04-05 23:25         ` Andrea Arcangeli
1999-04-05 23:37           ` Horst von Brand
1999-04-06  1:23             ` Andrea Arcangeli
1999-04-17 11:12       ` Andrea Arcangeli
1999-04-05 21:31     ` Chuck Lever
1999-04-06  0:15       ` Andrea Arcangeli
1999-04-06  2:14         ` Doug Ledford [this message]
1999-04-06 13:04           ` Andrea Arcangeli
1999-04-06 21:31             ` Stephen C. Tweedie
1999-04-06 22:27               ` Andrea Arcangeli
1999-04-07 12:27                 ` Stephen C. Tweedie
1999-04-25  3:22                   ` Chuck Lever
1999-04-06  5:52         ` Chuck Lever
1999-04-06 13:09           ` Andrea Arcangeli
1999-04-06 16:19           ` Eric W. Biederman
1999-04-06 20:26             ` Andrea Arcangeli
1999-04-07  5:00               ` Eric W. Biederman
1999-04-07 11:36                 ` Andrea Arcangeli
1999-04-06 14:02       ` Stephen C. Tweedie
1999-04-06 15:38         ` Chuck Lever
1999-04-06 17:16           ` Andrea Arcangeli
1999-04-06 18:07             ` Andrea Arcangeli
1999-04-06 21:22               ` Stephen C. Tweedie
1999-04-06 22:19                 ` Ingo Molnar
1999-04-06 22:40                   ` David Miller
1999-04-06 22:49                     ` Ingo Molnar
1999-04-06 22:53                       ` David Miller
1999-04-07 15:59                         ` Gabriel Paubert
1999-04-07 21:07                           ` Arvind Sankar
1999-04-09  6:58                             ` Eric W. Biederman
1999-04-09  9:27                               ` Gabriel Paubert
1999-04-09 15:40                                 ` Eric W. Biederman
1999-04-08  8:09                   ` Carlo Daffara
1999-04-06 22:31                 ` Andrea Arcangeli
1999-04-06 20:47             ` Chuck Lever
1999-04-06 21:04               ` Andrea Arcangeli
1999-04-06 21:11               ` Stephen C. Tweedie
1999-04-06 14:00     ` Stephen C. Tweedie
1999-04-06 16:29       ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=37096E02.C9E53CE2@redhat.com \
    --to=dledford@redhat.com \
    --cc=andrea@e-mind.com \
    --cc=cel@monkey.org \
    --cc=linux-kernel@vger.rutgers.edu \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox