linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* freemaps
@ 2002-12-14 22:47 Frederic Rossi (LMC)
  2002-12-15  3:09 ` freemaps Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Frederic Rossi (LMC) @ 2002-12-14 22:47 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar; +Cc: linux-mm

[-- Attachment #1: Type: text/plain, Size: 2086 bytes --]

Hi,

This patch is intended to provide better support for allocations done with
mmap. As mentioned by Ingo Molnar, keeping the last hole doesn't save us
from scanning vmas because of size constraint.

The following patch implements a simple scheme. The goal is to quickly
return virtual addresses but without scanning the address space at all. It
also works for fixed mappings by reserving areas of virtual addresses.

A global (per mm) cache is maintained from which we can peek ranges of
addresses. There is no link towards vmas. When addresses are cached out they
are simply forgotten. It is then up to the vma users to explicitly cache
out/back areas.

Two of the main functions used for allocations are:

vma_cache_detach(len) returns the first available range of this length
starting from the first block in the cache. The overhead is still very low
since only the cache is scanned (using a first-fit scheme).

vma_cache_area(addr, len) is used to reserve an area from addr to addr+len.
This gives also the possibility to reserve in advance a bunch of virtual
addresses without explicitly mapping them.

Caches are also accessible from /proc/<pid>/freemaps showing free areas
along with their length. Applications willing to map to fixed addresses can
check what is available before doing a mmap().

I guess it could probably give more control on the fragmentation of the
virtual address space too. I successfully ran this patch in a desktop
environment. I also did some simple testing to see how get_unmapped_area()
reacts. Running the kernel 2.5.50 I get
100000 mmaps, mmap=2545 msec munmap=59 msec
100000 mmaps, mmap=2545 msec munmap=58 msec
100000 mmaps, mmap=2544 msec munmap=60 msec
100000 mmaps, mmap=2547 msec munmap=60 msec

and with freemaps I get
100000 mmaps, mmap=79 msec munmap=60 msec
100000 mmaps, mmap=79 msec munmap=60 msec
100000 mmaps, mmap=80 msec munmap=60 msec
100000 mmaps, mmap=79 msec munmap=60 msec

Since there is quite an amazing difference I really would like to have your
comments on this.

I joined a patch against 2.5.50.

Regards,
Frederic







[-- Attachment #2: freemaps-2.5.50-p4.patch --]
[-- Type: application/octet-stream, Size: 24750 bytes --]

diff --exclude=RCS -BbNur linux-2.5.50/fs/binfmt_aout.c linux-2.5.50~/fs/binfmt_aout.c
--- linux-2.5.50/fs/binfmt_aout.c	Wed Nov 27 17:35:51 2002
+++ linux-2.5.50~/fs/binfmt_aout.c	Thu Dec 12 14:51:05 2002
@@ -24,6 +24,7 @@
 #include <linux/binfmts.h>
 #include <linux/personality.h>
 #include <linux/init.h>
+#include <linux/vma_cache.h>
 
 #include <asm/system.h>
 #include <asm/uaccess.h>
@@ -307,7 +308,8 @@
 		(current->mm->start_data = N_DATADDR(ex));
 	current->mm->brk = ex.a_bss +
 		(current->mm->start_brk = N_BSSADDR(ex));
-	current->mm->free_area_cache = TASK_UNMAPPED_BASE;
+
+	vma_cache_init (current->mm);
 
 	current->mm->rss = 0;
 	current->mm->mmap = NULL;
diff --exclude=RCS -BbNur linux-2.5.50/fs/binfmt_elf.c linux-2.5.50~/fs/binfmt_elf.c
--- linux-2.5.50/fs/binfmt_elf.c	Wed Nov 27 17:35:59 2002
+++ linux-2.5.50~/fs/binfmt_elf.c	Thu Dec 12 14:51:05 2002
@@ -35,6 +35,7 @@
 #include <linux/compiler.h>
 #include <linux/highmem.h>
 #include <linux/pagemap.h>
+#include <linux/vma_cache.h>
 
 #include <asm/uaccess.h>
 #include <asm/param.h>
@@ -617,7 +618,9 @@
 	/* Do this so that we can load the interpreter, if need be.  We will
 	   change some of these later */
 	current->mm->rss = 0;
-	current->mm->free_area_cache = TASK_UNMAPPED_BASE;
+ 	
+ 	vma_cache_init (current->mm);
+
 	retval = setup_arg_pages(bprm);
 	if (retval < 0) {
 		send_sig(SIGKILL, current, 0);
diff --exclude=RCS -BbNur linux-2.5.50/fs/proc/array.c linux-2.5.50~/fs/proc/array.c
--- linux-2.5.50/fs/proc/array.c	Wed Nov 27 17:36:05 2002
+++ linux-2.5.50~/fs/proc/array.c	Thu Dec 12 14:51:05 2002
@@ -73,6 +73,7 @@
 #include <linux/highmem.h>
 #include <linux/file.h>
 #include <linux/times.h>
+#include <linux/vma_cache.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -513,6 +514,99 @@
 	return len;
 }
 
+ssize_t proc_pid_read_vmc (struct task_struct *task, struct file * file, char * buf, size_t count, loff_t *ppos)
+{
+	struct mm_struct *mm;
+	struct vm_area_struct * map;
+	char *tmp, *kbuf;
+	long retval;
+	int off, lineno, loff;
+	struct list_head *h;
+
+	/* reject calls with out of range parameters immediately */
+	retval = 0;
+	if (*ppos > LONG_MAX)
+		goto out;
+	if (count == 0)
+		goto out;
+	off = (long)*ppos;
+	/*
+	 * We might sleep getting the page, so get it first.
+	 */
+	retval = -ENOMEM;
+	kbuf = (char*)__get_free_page(GFP_KERNEL);
+	if (!kbuf)
+		goto out;
+
+	tmp = (char*)__get_free_page(GFP_KERNEL);
+	if (!tmp)
+		goto out_free1;
+
+	task_lock(task);
+	mm = task->mm;
+	if (mm)
+		atomic_inc(&mm->mm_users);
+	task_unlock(task);
+	retval = 0;
+	if (!mm)
+		goto out_free2;
+
+	down_read(&mm->mmap_sem);
+	map = mm->mmap;
+	lineno = 0;
+	loff = 0;
+	if (count > PAGE_SIZE)
+		count = PAGE_SIZE;
+
+	list_for_each (h, &mm->vma_cache_head) {
+		struct vma_cache_struct *vmc = list_entry (h, struct vma_cache_struct, head);
+		
+		int len;
+		if (off > PAGE_SIZE) {
+			off -= PAGE_SIZE;
+			goto next;
+		}
+
+		len = sprintf (tmp, "%lx-%lx %lu\n", vmc->vm_start, vmc->vm_end, vmc->vm_end-vmc->vm_start);
+		len -= off;
+		if (len > 0) {
+			if (retval+len > count) {
+				/* only partial line transfer possible */
+				len = count - retval;
+				/* save the offset where the next read
+				 * must start */
+				loff = len+off;
+			}
+			memcpy(kbuf+retval, tmp+off, len);
+			retval += len;
+		}
+		off = 0;
+next:
+		if (!loff)
+			lineno++;
+		if (retval >= count)
+			break;
+		if (loff) BUG();
+		map = map->vm_next;
+	}
+
+	up_read(&mm->mmap_sem);
+	mmput(mm);
+
+	if (retval > count) BUG();
+	if (copy_to_user(buf, kbuf, retval))
+		retval = -EFAULT;
+	else
+		*ppos = (lineno << PAGE_SHIFT) + loff;
+
+out_free2:
+	free_page((unsigned long)tmp);
+out_free1:
+	free_page((unsigned long)kbuf);
+out:
+	return retval;
+}
+
 ssize_t proc_pid_read_maps (struct task_struct *task, struct file * file, char * buf,
 			  size_t count, loff_t *ppos)
 {
diff --exclude=RCS -BbNur linux-2.5.50/fs/proc/base.c linux-2.5.50~/fs/proc/base.c
--- linux-2.5.50/fs/proc/base.c	Wed Nov 27 17:36:06 2002
+++ linux-2.5.50~/fs/proc/base.c	Thu Dec 12 14:51:05 2002
@@ -54,6 +54,7 @@
 	PROC_PID_CMDLINE,
 	PROC_PID_STAT,
 	PROC_PID_STATM,
+	PROC_PID_VMC,
 	PROC_PID_MAPS,
 	PROC_PID_MOUNTS,
 	PROC_PID_WCHAN,
@@ -75,6 +76,7 @@
   E(PROC_PID_CMDLINE,	"cmdline",	S_IFREG|S_IRUGO),
   E(PROC_PID_STAT,	"stat",		S_IFREG|S_IRUGO),
   E(PROC_PID_STATM,	"statm",	S_IFREG|S_IRUGO),
+  E(PROC_PID_VMC,	"freemaps",	S_IFREG|S_IRUGO),
   E(PROC_PID_MAPS,	"maps",		S_IFREG|S_IRUGO),
   E(PROC_PID_MEM,	"mem",		S_IFREG|S_IRUSR|S_IWUSR),
   E(PROC_PID_CWD,	"cwd",		S_IFLNK|S_IRWXUGO),
@@ -99,6 +101,7 @@
 }
 
 ssize_t proc_pid_read_maps(struct task_struct*,struct file*,char*,size_t,loff_t*);
+ssize_t proc_pid_read_vmc(struct task_struct*,struct file*,char*,size_t,loff_t*);
 int proc_pid_stat(struct task_struct*,char*);
 int proc_pid_status(struct task_struct*,char*);
 int proc_pid_statm(struct task_struct*,char*);
@@ -336,6 +339,21 @@
 	.read		= pid_maps_read,
 };
 
+static ssize_t pid_vmc_read(struct file * file, char * buf,
+			      size_t count, loff_t *ppos)
+{
+	struct inode * inode = file->f_dentry->d_inode;
+	struct task_struct *task = proc_task(inode);
+	ssize_t res;
+
+	res = proc_pid_read_vmc (task, file, buf, count, ppos);
+	return res;
+}
+
+static struct file_operations proc_vmc_operations = {
+	read:		pid_vmc_read,
+};
+
 extern struct seq_operations mounts_op;
 static int mounts_open(struct inode *inode, struct file *file)
 {
@@ -1023,6 +1041,9 @@
 			inode->i_fop = &proc_info_file_operations;
 			ei->op.proc_read = proc_pid_statm;
 			break;
+		case PROC_PID_VMC:
+			inode->i_fop = &proc_vmc_operations;
+			break;
 		case PROC_PID_MAPS:
 			inode->i_fop = &proc_maps_operations;
 			break;
diff --exclude=RCS -BbNur linux-2.5.50/include/linux/init_task.h linux-2.5.50~/include/linux/init_task.h
--- linux-2.5.50/include/linux/init_task.h	Wed Nov 27 17:35:49 2002
+++ linux-2.5.50~/include/linux/init_task.h	Thu Dec 12 14:51:05 2002
@@ -41,6 +41,7 @@
 	.page_table_lock =  SPIN_LOCK_UNLOCKED, 		\
 	.mmlist		= LIST_HEAD_INIT(name.mmlist),		\
 	.default_kioctx = INIT_KIOCTX(name.default_kioctx, name),	\
+        .vma_cache_head = LIST_HEAD_INIT(name.vma_cache_head),  \
 }
 
 #define INIT_SIGNALS(sig) {	\
diff --exclude=RCS -BbNur linux-2.5.50/include/linux/sched.h linux-2.5.50~/include/linux/sched.h
--- linux-2.5.50/include/linux/sched.h	Wed Nov 27 17:35:49 2002
+++ linux-2.5.50~/include/linux/sched.h	Thu Dec 12 14:51:05 2002
@@ -172,11 +172,16 @@
 
 #include <linux/aio.h>
 
+struct vma_cache_struct {
+	struct list_head head;
+	unsigned long vm_start;
+	unsigned long vm_end;
+};
+
 struct mm_struct {
 	struct vm_area_struct * mmap;		/* list of VMAs */
 	struct rb_root mm_rb;
 	struct vm_area_struct * mmap_cache;	/* last find_vma result */
-	unsigned long free_area_cache;		/* first hole */
 	pgd_t * pgd;
 	atomic_t mm_users;			/* How many users with user space? */
 	atomic_t mm_count;			/* How many references to "struct mm_struct" (users count as 1) */
@@ -189,6 +194,8 @@
 						 * by mmlist_lock
 						 */
 
+        struct list_head vma_cache_head;        /* cache for free virtual address areas */
+
 	unsigned long start_code, end_code, start_data, end_data;
 	unsigned long start_brk, brk, start_stack;
 	unsigned long arg_start, arg_end, env_start, env_end;
diff --exclude=RCS -BbNur linux-2.5.50/include/linux/slab.h linux-2.5.50~/include/linux/slab.h
--- linux-2.5.50/include/linux/slab.h	Wed Nov 27 17:36:23 2002
+++ linux-2.5.50~/include/linux/slab.h	Thu Dec 12 14:51:05 2002
@@ -65,6 +65,7 @@
 extern int FASTCALL(kmem_cache_reap(int));
 
 /* System wide caches */
+extern kmem_cache_t	*vma_cache_cachep;
 extern kmem_cache_t	*vm_area_cachep;
 extern kmem_cache_t	*mm_cachep;
 extern kmem_cache_t	*names_cachep;
diff --exclude=RCS -BbNur linux-2.5.50/include/linux/vma_cache.h linux-2.5.50~/include/linux/vma_cache.h
--- linux-2.5.50/include/linux/vma_cache.h	Wed Dec 31 19:00:00 1969
+++ linux-2.5.50~/include/linux/vma_cache.h	Thu Dec 12 14:51:05 2002
@@ -0,0 +1,25 @@
+#ifndef _VMA_CACHE_H_
+#define _VMA_CACHE_H_
+
+#define VMA_CACHE_EMPTY(m,a)    (list_empty(&(a)->head))
+
+#define vma_cache_alloc()     (kmem_cache_alloc(vma_cache_cachep, GFP_KERNEL))
+#define vma_cache_free(v)                              \
+do {                                                   \
+        if (v) {                                       \
+                list_del_init (&(v)->head);            \
+                kmem_cache_free (vma_cache_cachep, v); \
+        }                                              \
+} while (0)
+
+int vma_cache_init (struct mm_struct *);
+struct vma_cache_struct *vma_cache_find (struct mm_struct *, unsigned long, int);
+void vma_cache_shtdn (struct mm_struct *);
+int vma_cache_attach (struct mm_struct *, unsigned long, int);
+int vma_cache_merge (struct mm_struct *, struct vma_cache_struct *, unsigned long, int);
+int vma_cache_clone (struct mm_struct *, struct mm_struct *);
+unsigned long vma_cache_area (struct mm_struct *, unsigned long, int);
+unsigned long vma_cache_detach (struct mm_struct *, int);
+
+#endif /* #ifndef _VMA_CACHE_H_ */
+
diff --exclude=RCS -BbNur linux-2.5.50/kernel/exit.c linux-2.5.50~/kernel/exit.c
--- linux-2.5.50/kernel/exit.c	Wed Nov 27 17:36:18 2002
+++ linux-2.5.50~/kernel/exit.c	Thu Dec 12 14:51:05 2002
@@ -21,6 +21,7 @@
 #include <linux/ptrace.h>
 #include <linux/profile.h>
 #include <linux/mount.h>
+#include <linux/vma_cache.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
diff --exclude=RCS -BbNur linux-2.5.50/kernel/fork.c linux-2.5.50~/kernel/fork.c
--- linux-2.5.50/kernel/fork.c	Wed Nov 27 17:35:49 2002
+++ linux-2.5.50~/kernel/fork.c	Thu Dec 12 14:51:05 2002
@@ -29,6 +29,7 @@
 #include <linux/futex.h>
 #include <linux/ptrace.h>
 #include <linux/mount.h>
+#include <linux/vma_cache.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -215,10 +216,10 @@
 
 	down_write(&oldmm->mmap_sem);
 	flush_cache_mm(current->mm);
+	vma_cache_init (mm);
 	mm->locked_vm = 0;
 	mm->mmap = NULL;
 	mm->mmap_cache = NULL;
-	mm->free_area_cache = TASK_UNMAPPED_BASE;
 	mm->map_count = 0;
 	mm->rss = 0;
 	mm->cpu_vm_mask = 0;
@@ -246,6 +247,7 @@
 				goto fail_nomem;
 			charge += len;
 		}
+		vma_cache_area (mm, mpnt->vm_start, mpnt->vm_end-mpnt->vm_start);
 		tmp = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL);
 		if (!tmp)
 			goto fail_nomem;
@@ -334,7 +336,7 @@
 	mm->page_table_lock = SPIN_LOCK_UNLOCKED;
 	mm->ioctx_list_lock = RW_LOCK_UNLOCKED;
 	mm->default_kioctx = (struct kioctx)INIT_KIOCTX(mm->default_kioctx, *mm);
-	mm->free_area_cache = TASK_UNMAPPED_BASE;
+	vma_cache_init (mm);
 
 	if (likely(!mm_alloc_pgd(mm)))
 		return mm;
@@ -381,6 +383,7 @@
 		mmlist_nr--;
 		spin_unlock(&mmlist_lock);
 		exit_mmap(mm);
+		vma_cache_shtdn(mm);
 		mmdrop(mm);
 	}
 }
@@ -1054,6 +1057,9 @@
 /* SLAB cache for mm_struct structures (tsk->mm) */
 kmem_cache_t *mm_cachep;
 
+/* SLAB cache for vma_cache structures */
+kmem_cache_t *vma_cache_cachep;
+
 void __init proc_caches_init(void)
 {
 	sigact_cachep = kmem_cache_create("signal_act",
@@ -1085,4 +1091,11 @@
 			SLAB_HWCACHE_ALIGN, NULL, NULL);
 	if(!mm_cachep)
 		panic("vma_init: Cannot alloc mm_struct SLAB cache");
+
+	vma_cache_cachep = kmem_cache_create("vma_cache_struct",
+			sizeof(struct vma_cache_struct), 0,
+			SLAB_HWCACHE_ALIGN, NULL, NULL);
+	if(!vm_area_cachep)
+		panic("vma_init: Cannot alloc vma_cache_struct SLAB cache");
+
 }
diff --exclude=RCS -BbNur linux-2.5.50/mm/Makefile linux-2.5.50~/mm/Makefile
--- linux-2.5.50/mm/Makefile	Wed Nov 27 17:36:14 2002
+++ linux-2.5.50~/mm/Makefile	Thu Dec 12 14:51:05 2002
@@ -8,7 +8,7 @@
 	    vmalloc.o slab.o bootmem.o swap.o vmscan.o page_alloc.o \
 	    oom_kill.o shmem.o highmem.o mempool.o msync.o mincore.o \
 	    readahead.o pdflush.o page-writeback.o rmap.o madvise.o \
-	    vcache.o truncate.o
+	    vcache.o truncate.o vma_cache.o
 
 obj-$(CONFIG_SWAP)	+= page_io.o swap_state.o swapfile.o
 
diff --exclude=RCS -BbNur linux-2.5.50/mm/mmap.c linux-2.5.50~/mm/mmap.c
--- linux-2.5.50/mm/mmap.c	Wed Nov 27 17:36:06 2002
+++ linux-2.5.50~/mm/mmap.c	Thu Dec 12 14:51:05 2002
@@ -18,6 +18,7 @@
 #include <linux/security.h>
 #include <linux/hugetlb.h>
 #include <linux/profile.h>
+#include <linux/vma_cache.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgalloc.h>
@@ -514,6 +515,7 @@
 	if (vma && vma->vm_start < addr + len) {
 		if (do_munmap(mm, addr, len))
 			return -ENOMEM;
+		vma_cache_area (mm, addr, len);
 		goto munmap_back;
 	}
 
@@ -647,7 +649,6 @@
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
-	int found_hole = 0;
 
 	if (len > TASK_SIZE)
 		return -ENOMEM;
@@ -657,25 +658,9 @@
 		vma = find_vma(mm, addr);
 		if (TASK_SIZE - len >= addr &&
 		    (!vma || addr + len <= vma->vm_start))
-			return addr;
-	}
-	addr = mm->free_area_cache;
-
-	for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
-		/* At this point:  (!vma || addr < vma->vm_end). */
-		if (TASK_SIZE - len < addr)
-			return -ENOMEM;
-		/*
-		 * Record the first available hole.
-		 */
-		if (!found_hole && (!vma || addr < vma->vm_start)) {
-			mm->free_area_cache = addr;
-			found_hole = 1;
-		}
-		if (!vma || addr + len <= vma->vm_start)
-			return addr;
-		addr = vma->vm_end;
+			return vma_cache_area (current->mm, addr, len);
 	}
+	return vma_cache_detach (current->mm, len);
 }
 #else
 extern unsigned long arch_get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
@@ -688,7 +673,7 @@
 			return -ENOMEM;
 		if (addr & ~PAGE_MASK)
 			return -EINVAL;
-		return addr;
+		return vma_cache_area (current->mm, addr, len);
 	}
 
 	if (file && file->f_op && file->f_op->get_unmapped_area)
@@ -962,13 +947,8 @@
 	area->vm_mm->total_vm -= len >> PAGE_SHIFT;
 	if (area->vm_flags & VM_LOCKED)
 		area->vm_mm->locked_vm -= len >> PAGE_SHIFT;
-	/*
-	 * Is this a new hole at the lowest possible address?
-	 */
-	if (area->vm_start >= TASK_UNMAPPED_BASE &&
-				area->vm_start < area->vm_mm->free_area_cache)
-	      area->vm_mm->free_area_cache = area->vm_start;
 
+	vma_cache_attach (mm, area->vm_start, area->vm_end-area->vm_start);
 	remove_shared_vm_struct(area);
 
 	if (area->vm_ops && area->vm_ops->close)
@@ -1336,7 +1316,6 @@
 		kmem_cache_free(vm_area_cachep, mpnt);
 		mpnt = next;
 	}
-		
 }
 
 /* Insert vm structure into process list sorted by address
diff --exclude=RCS -BbNur linux-2.5.50/mm/vma_cache.c linux-2.5.50~/mm/vma_cache.c
--- linux-2.5.50/mm/vma_cache.c	Wed Dec 31 19:00:00 1969
+++ linux-2.5.50~/mm/vma_cache.c	Fri Dec 13 08:52:52 2002
@@ -0,0 +1,364 @@
+/*
+ *  linux/mm/vma_cache.c
+ * 
+ *  These routines are used for the allocation of virtual memory areas. Each MM context is 
+ *  assigned a cache which contains virtual address ranges a process can allocate to map into 
+ *  its own address space. These routines assume the caller holds the global mm->mmap_sem.
+ *  Dec 2002, frederic.rossi@ericsson.ca
+ *
+ */
+
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/errno.h>
+#include <linux/vma_cache.h>
+
+#define next_area(v) (list_entry ((v)->head.next, struct vma_cache_struct, head))
+#define prev_area(v) (list_entry ((v)->head.prev, struct vma_cache_struct, head))
+
+/* not mapped addresses */
+#define unmapped_addr(v,a)    ((a)>=(v)->vm_start && (a)<=(v)->vm_end)
+#define unmapped_area(v,a,l)  ((a)>=(v)->vm_start && (a)+(l)<=(v)->vm_end)
+#define unmapped_left(v,a,l)  ((a)+(l)<(v)->vm_start)
+#define unmapped_right(v,al)  ((a)>(v)->vm_end)
+
+/* partially mapped addresses */
+#define pmapped_left(v,a,l)   ((a)+(l)==(v)->vm_start)
+#define pmapped_right(v,a,l)  ((a)==(v)->vm_end)
+
+static __inline__ int vma_cache_isvalid (struct mm_struct *mm, unsigned long addr, int len)
+{
+	return (addr >= TASK_UNMAPPED_BASE && addr+len <= TASK_SIZE)? 1 : 0;
+}	
+
+static __inline__ int vma_cache_chainout (struct mm_struct *mm, struct vma_cache_struct *vmc)
+{
+	if (!vmc)
+		return -EINVAL;
+
+	list_del_init (&vmc->head);
+	vma_cache_free (vmc);
+	return 0;
+}
+
+static __inline__ struct vma_cache_struct *vma_cache_append (struct mm_struct *mm, struct vma_cache_struct *vmc, unsigned long addr, int len)
+{
+	struct vma_cache_struct *new;
+
+	new = vma_cache_alloc ();
+	if (!new)
+		return NULL;
+
+	INIT_LIST_HEAD (&new->head);
+	new->vm_start = addr;
+	new->vm_end   = addr+len;
+	list_add (&new->head, &vmc->head);
+	
+	return new;
+}
+
+static __inline__ struct vma_cache_struct *vma_cache_insert (struct mm_struct *mm, struct vma_cache_struct *vmc, unsigned long addr, int len)
+{
+	struct vma_cache_struct *new;
+
+	new = vma_cache_alloc ();
+	if (!new)
+		return NULL;
+
+	INIT_LIST_HEAD (&new->head);
+	new->vm_start = addr;
+	new->vm_end   = addr+len;
+	list_add_tail (&new->head, &vmc->head);
+	
+	return new;
+}
+
+void vma_cache_shtdn (struct mm_struct *mm)
+{
+	struct vma_cache_struct *nptr, *vmc, *head;
+	
+	head = list_entry (&mm->vma_cache_head, struct vma_cache_struct, head);
+	vmc  = head;
+	nptr = NULL;
+
+	while (nptr != head) {
+		nptr = next_area (vmc);
+		if (vma_cache_attach (mm, vmc->vm_start, vmc->vm_end-vmc->vm_start) == 0)
+			vma_cache_free (vmc);
+		vmc = nptr;
+	}
+}
+
+int vma_cache_clone (struct mm_struct *mm_dst, struct mm_struct *mm_src)
+{
+	struct list_head *tmp, *prev;
+	struct vma_cache_struct *new;
+
+	if (!list_empty (&mm_dst->vma_cache_head)) {
+		/* Remove previous mapping */
+	again:
+		list_for_each (tmp, &mm_dst->vma_cache_head) {
+			struct vma_cache_struct *vmc = list_entry (tmp, struct vma_cache_struct, head);
+			vma_cache_free (vmc);
+			goto again;
+		}
+		INIT_LIST_HEAD (&mm_dst->vma_cache_head);
+	}
+		
+	prev = &mm_dst->vma_cache_head;
+	list_for_each (tmp, &mm_src->vma_cache_head) {
+		struct vma_cache_struct *vmc = list_entry (tmp, struct vma_cache_struct, head);
+
+		new =  vma_cache_alloc ();
+		if (!new)
+			return -ENOMEM;
+		
+		INIT_LIST_HEAD (&new->head);
+		new->vm_start = vmc->vm_start;
+		new->vm_end   = vmc->vm_end;
+		list_add (&new->head, prev);
+		prev = &new->head;
+	}
+
+	return 0;
+}
+
+int vma_cache_init (struct mm_struct *mm)
+{
+	struct vma_cache_struct *vmc;
+	
+        INIT_LIST_HEAD (&mm->vma_cache_head);
+
+	vmc =  vma_cache_alloc ();
+	if (!vmc)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD (&vmc->head);
+	vmc->vm_start = TASK_UNMAPPED_BASE;
+	vmc->vm_end   = TASK_SIZE;
+	list_add (&vmc->head, &mm->vma_cache_head);
+
+	return 0;
+}
+
+struct vma_cache_struct *vma_cache_find (struct mm_struct *mm, unsigned long addr, int len)
+{
+	struct list_head *tmp;
+
+	if (!len)
+		return NULL;
+
+	list_for_each (tmp, &mm->vma_cache_head) {
+		struct vma_cache_struct *vmc = list_entry (tmp, struct vma_cache_struct, head);
+		if (unmapped_area (vmc, addr, len))
+			return vmc;
+	}
+
+	return NULL;
+}
+
+/* 
+ * Give an address, find the left cache element and merge if mergeable.
+ */
+static struct vma_cache_struct *vma_cache_merge_left (struct mm_struct *mm, struct vma_cache_struct *vmc, unsigned long addr, int len)
+{
+	struct vma_cache_struct *vmc_left = NULL;
+
+	if (!vmc) {
+		vmc = vma_cache_find (mm, addr, len);
+		if (!vmc) goto out;
+	}
+
+	vmc_left = prev_area (vmc);
+
+	if (!vmc_left)
+		goto out;
+
+	if (vmc_left == vmc)
+		goto out;
+
+	/* merge */
+	if (vmc->vm_start <= vmc_left->vm_end && vmc->vm_start >= vmc_left->vm_start){ 
+		vmc_left->vm_end = vmc->vm_end;
+		vma_cache_chainout (mm, vmc);
+	}
+
+out:
+	return vmc_left;
+}
+
+/* 
+ * Give an address, find the right cache element and merge if mergeable.
+ */
+static struct vma_cache_struct *vma_cache_merge_right (struct mm_struct *mm, struct vma_cache_struct *vmc, unsigned long addr, int len)
+{
+	struct vma_cache_struct *vmc_right = NULL;
+
+	if (!vmc) {
+		vmc = vma_cache_find (mm, addr, len);
+		if (!vmc) goto out;
+	}
+
+	vmc_right = next_area (vmc);
+
+	if (!vmc_right)
+		goto out;
+
+	if (vmc_right == vmc)
+		goto out;
+
+	/* merge */
+	if (vmc->vm_end >= vmc_right->vm_start && vmc->vm_end <= vmc_right->vm_end) {
+		vmc_right->vm_start = vmc->vm_start;
+		vma_cache_chainout (mm, vmc);
+	}
+out:
+	return vmc_right;
+}
+
+int vma_cache_merge (struct mm_struct *mm, struct vma_cache_struct *vmc, unsigned long addr, int len)
+{
+	int err=0;
+
+	if (!vmc || vma_cache_find (mm, addr, len) == vmc) {
+		err  = (int )vma_cache_merge_left (mm, vmc, addr, len);
+		err |= (int )vma_cache_merge_right (mm, vmc, addr, len);
+	}
+
+	return (err)? -EINVAL : 0;
+}
+
+/*
+ * Insert or increase an existing area. Returns 0 if successfull.
+ */
+int vma_cache_attach (struct mm_struct *mm, unsigned long addr, int len)
+{
+	struct list_head *tmp;
+	struct vma_cache_struct *vmc;
+
+	if (!vma_cache_isvalid (mm, addr, len))
+		return -EINVAL;
+
+	vmc = list_entry (&mm->vma_cache_head, struct vma_cache_struct, head);
+	list_for_each (tmp, &mm->vma_cache_head) {
+		vmc = list_entry (tmp, struct vma_cache_struct, head);
+		
+		if (addr > vmc->vm_end)
+			continue;
+
+		if (pmapped_left (vmc, addr, len)) { 
+			vmc->vm_start = addr;
+			return vma_cache_merge (mm, vmc, addr, len);
+		}
+
+		if (unmapped_left (vmc, addr, len)) {
+			struct vma_cache_struct *new = vma_cache_insert (mm, vmc, addr, len);
+			if (!new)
+				return -ENOMEM;
+			return vma_cache_merge (mm, new, addr, len);
+		}
+		
+		if (pmapped_right (vmc, addr, len)) {
+			vmc->vm_end = addr+len;
+			return vma_cache_merge (mm, vmc, addr, len);
+		}
+
+		if (unmapped_area (vmc, addr, len))
+			return 0;
+
+		if (addr < vmc->vm_start && addr+len > vmc->vm_end) {
+			vmc->vm_start = addr;
+			vmc->vm_end   = addr+len;
+			return vma_cache_merge (mm, vmc, addr, len);
+		}
+
+		if (addr == vmc->vm_start && addr+len > vmc->vm_end) {
+			vmc->vm_end = addr+len;
+			return vma_cache_merge (mm, vmc, addr, len);
+		}
+	
+		/* shouldn't go here */
+		printk ("vma_cache_area: task '%s' [%d]. Don't know how to attach address range [%lx, %lx] in range [%lx, %lx]\n", 
+		      current->comm, current->pid, addr, addr+len, vmc->vm_start, vmc->vm_end);
+
+		break;
+	}
+
+	return -EINVAL;
+}
+
+/*
+ * Reserve a range of addresses. An error value is returned only if it's not
+ * possible to reserve this area.
+ */
+unsigned long vma_cache_area (struct mm_struct *mm, unsigned long addr, int len)
+{
+	struct vma_cache_struct *vmc;
+
+	if (!vma_cache_isvalid (mm, addr, len))
+		return addr;
+
+	vmc = vma_cache_find (mm, addr, len);
+	if (!vmc)
+		return addr;
+
+	if (addr == vmc->vm_start && addr+len <= vmc->vm_end) {
+		vmc->vm_start = addr+len;
+		if (vmc->vm_start == vmc->vm_end)
+			vma_cache_chainout (mm, vmc);
+		return addr;
+	}
+	
+	if (addr > vmc->vm_start && addr+len == vmc->vm_end) {
+		vmc->vm_end = addr;
+		if (vmc->vm_start == vmc->vm_end)
+			vma_cache_chainout (mm, vmc);
+		return addr;
+	}
+	
+	if (addr > vmc->vm_start && addr+len < vmc->vm_end) {
+		unsigned long old_end;
+
+		/* vmc->vm_start -> addr */
+		old_end = vmc->vm_end;
+		vmc->vm_end = addr;
+
+		/* addr+len -> vmc->vm_end */
+		vma_cache_append (mm, vmc, addr+len, old_end-addr-len);
+		return addr;
+	}
+
+	printk ("vma_cache_area: task '%s' [%d]. Don't know how to cache out address range [%lx, %lx] from range [%lx, %lx]\n", 
+		current->comm, current->pid, addr, addr+len, vmc->vm_start, vmc->vm_end);
+
+	return -EINVAL;
+}
+
+/* 
+ * Find the first available range of addresses. The return value
+ * is to satisfy get_unmapped_area().
+ */
+unsigned long vma_cache_detach (struct mm_struct *mm, int len)
+{
+	struct list_head *tmp;
+	unsigned long addr;
+
+	if (!len)
+		return -EINVAL;
+
+	addr = -ENOMEM;
+
+	list_for_each (tmp, &mm->vma_cache_head) {
+		struct vma_cache_struct *vmc = list_entry (tmp, struct vma_cache_struct, head);
+		
+		if (len <= vmc->vm_end - vmc->vm_start) {
+			addr = vmc->vm_start;
+			vmc->vm_start += len;
+			if (vmc->vm_start == vmc->vm_end)
+				vma_cache_chainout (mm, vmc);
+			break;
+		}
+	}
+	
+	return addr;
+}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: freemaps
  2002-12-14 22:47 freemaps Frederic Rossi (LMC)
@ 2002-12-15  3:09 ` Andrew Morton
  2002-12-15  8:41   ` freemaps Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2002-12-15  3:09 UTC (permalink / raw)
  To: Frederic Rossi (LMC); +Cc: Ingo Molnar, linux-mm

"Frederic Rossi (LMC)" wrote:
> 
> ...
> 100000 mmaps, mmap=2545 msec munmap=59 msec
> 100000 mmaps, mmap=2545 msec munmap=58 msec
> 100000 mmaps, mmap=2544 msec munmap=60 msec
> 100000 mmaps, mmap=2547 msec munmap=60 msec
> 
> and with freemaps I get
> 100000 mmaps, mmap=79 msec munmap=60 msec
> 100000 mmaps, mmap=79 msec munmap=60 msec
> 100000 mmaps, mmap=80 msec munmap=60 msec
> 100000 mmaps, mmap=79 msec munmap=60 msec
> 

Yes, this is a real failing.

>  
> +ssize_t proc_pid_read_vmc (struct task_struct *task, struct file * file, char * buf, size_t count, loff_t *ppos)
> +{

This should use the seq_file API.

> +struct vma_cache_struct {
> +	struct list_head head;
> +	unsigned long vm_start;
> +	unsigned long vm_end;
> +};

So this is the key part.  It is a per-mm linear list of unmapped areas.

You state that its locking is via mm->mmap_sem.  I assume that means
a down_write() of that semaphore?

As this is a linear list, I do not understand why it does not have similar failure
modes to the current search.  Suppose this list describes 100,000 4k unmapped
areas and the application requests an 8k mmap??

> +static __inline__ int vma_cache_chainout (struct mm_struct *mm, struct vma_cache_struct *vmc)
> +{
> +	if (!vmc)
> +		return -EINVAL;
> +
> +	list_del_init (&vmc->head);
> +	vma_cache_free (vmc);

vma_cache_free() already does the list_del_init().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: freemaps
  2002-12-15  3:09 ` freemaps Andrew Morton
@ 2002-12-15  8:41   ` Ingo Molnar
  2002-12-15  9:03     ` freemaps Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2002-12-15  8:41 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Frederic Rossi (LMC), linux-mm

On Sat, 14 Dec 2002, Andrew Morton wrote:

> So this is the key part.  It is a per-mm linear list of unmapped areas.

yep.

> As this is a linear list, I do not understand why it does not have
> similar failure modes to the current search.  Suppose this list
> describes 100,000 4k unmapped areas and the application requests an 8k
> mmap??

i bet because in the (artificial?) test presented the 'hole' distribution
is much nicer than the 'allocated areas' distribution. There are real-life
allocation patterns, where this scheme suffers the same kind of regression
the old scheme suffered. Eg. the one i presented originally, which
triggered a regression, the NPTL stack allocator, where there is a 4K
unmapped area between thread stacks. Under that workload the
freemap-patched kernel will suffer badly over 2.5.50.

but the hole-index might in fact be a quality improvement in a number of
cases. It wont save us in a number of other cases, but one thing is sure:
holes do not have types, allocated vmas have. So the complexity of the
hole-space will always be at least as low as the complexity of the
allocated-area-space. If there are lots of vmas of different types then
the complexity of the hole-space can be significantly lower than the
complexity of the vma-space. This happens quite often.

a hybrid approach would keep both improvements: take the hole index _plus_
the free-area cache i introduced, and if there's a cachemiss, search the
hole-list.

another approach might be to maintain some sort of tree of holes.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: freemaps
  2002-12-15  8:41   ` freemaps Ingo Molnar
@ 2002-12-15  9:03     ` Andrew Morton
  2002-12-15  9:36       ` freemaps Ingo Molnar
  2002-12-16  0:51       ` freemaps William Lee Irwin III
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2002-12-15  9:03 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Frederic Rossi (LMC), linux-mm

Ingo Molnar wrote:
> 
> ...
> another approach might be to maintain some sort of tree of holes.

This one, I'd suggest.  If we're going to fix this we may as
well fix it right.  Otherwise there will always be whacky failure
modes.

Trees are tricky, because we don't like to recur.

I expect this could be solved with two trees:

- For searching, a radix-tree indexed by hole size.  A list
  of same-sized holes at each leaf.

- For insertion (where we must perform merging) an rbtree.

But:

- Do we need to keep the lists of same-sized holes sorted by
  virtual address, to avoid fragmentation?

- Do all mm's incur all this stuff, or do we build it all when
  some threshold is crossed?

- How does it play with non-linear mappings?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: freemaps
  2002-12-15  9:03     ` freemaps Andrew Morton
@ 2002-12-15  9:36       ` Ingo Molnar
  2002-12-16  0:51       ` freemaps William Lee Irwin III
  1 sibling, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2002-12-15  9:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Frederic Rossi (LMC), linux-mm, Andrea Arcangeli

On Sun, 15 Dec 2002, Andrew Morton wrote:

> Ingo Molnar wrote:
> > 
> > ...
> > another approach might be to maintain some sort of tree of holes.
> 
> This one, I'd suggest.  If we're going to fix this we may as
> well fix it right.  Otherwise there will always be whacky failure
> modes.
> 
> Trees are tricky, because we don't like to recur.
> 
> I expect this could be solved with two trees:
> 
> - For searching, a radix-tree indexed by hole size.  A list
>   of same-sized holes at each leaf.
> 
> - For insertion (where we must perform merging) an rbtree.

yes. I suspect this is close to what Andrea has/had in mind?

> But:
> 
> - Do we need to keep the lists of same-sized holes sorted by
>   virtual address, to avoid fragmentation?

the best anti-fragmentation technique is i guess what we have now: to use
the smallest matching hole at the lowest possible address. I think we
should give up strong anti-fragmentation techniques only once 32-bit
address spaces have become a distant memory, definitely not these days,
when the problems created by the limits of the 32-bit address space are
probably at their peak point in history.

but, before we go forward, i'd really suggest to run _real_ tests. The
free-area cache i added was to address one specific real-life workload.  
Frederic's test i dont know. One area i'd suspect we still suck somewhat
are the JITs and the memory protectorsm, but the loss due to the free-area
searching has to be quantified. Obviously, until trees are introduced
there will always be regressions.

> - Do all mm's incur all this stuff, or do we build it all when
>   some threshold is crossed?

we had some sort of threshold ages ago - i'd rather have a fast
constructor for this stuff instead of trying to hide the costs by cutting
off the small-size cases. I think basically everything but benchmarks will
have more than just a few mappings.

> - How does it play with non-linear mappings?

nonlinear mappings do not care about the structure of vmas (and the
structure of inverse vmas) - they are special types of vmas.

	Ingo


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: freemaps
  2002-12-15  9:03     ` freemaps Andrew Morton
  2002-12-15  9:36       ` freemaps Ingo Molnar
@ 2002-12-16  0:51       ` William Lee Irwin III
  2002-12-16  1:03         ` freemaps Andrew Morton
  1 sibling, 1 reply; 8+ messages in thread
From: William Lee Irwin III @ 2002-12-16  0:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, Frederic Rossi (LMC), linux-mm

On Sun, Dec 15, 2002 at 01:03:26AM -0800, Andrew Morton wrote:
> I expect this could be solved with two trees:
> - For searching, a radix-tree indexed by hole size.  A list
>   of same-sized holes at each leaf.
> - For insertion (where we must perform merging) an rbtree.

It's not quite that easy, because what this appears to suggest
would result in best fit, not address-ordered first fit.

My first thought is a length-constrained address minimization on a 2-d
tree, quadtree, or 2D k-d-b tree keyed on address and length; the
duelling tree solutions aren't really capable of providing O(lg(n))
guarantees for address-ordered first fit (or implementing address-
ordered first fit at all). I have no clue how to do this both
nonrecursively and without frequently-dirtied temporary node linkage,
though, as it appears to require a work stack (or potentially even a
queue for prioritized searches) like many other graph searches. Also,
I don't have a proof in hand that this is really O(lg(n)) worst-case,
though there are relatively obvious intuitive reasons to suspect so.


On Sun, Dec 15, 2002 at 01:03:26AM -0800, Andrew Morton wrote:
> But:
> - Do we need to keep the lists of same-sized holes sorted by
>   virtual address, to avoid fragmentation?

Well, not so much to avoid fragmentation, but avoid worst cases.


On Sun, Dec 15, 2002 at 01:03:26AM -0800, Andrew Morton wrote:
> - Do all mm's incur all this stuff, or do we build it all when
>   some threshold is crossed?

Not all incur these kinds of problems; but it is probably expensive
to build non-incrementally once the threshold is crossed.


On Sun, Dec 15, 2002 at 01:03:26AM -0800, Andrew Morton wrote:
> - How does it play with non-linear mappings?

It doesn't care; they're just vma's parked on a virtual address range
like the rest of them.


Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: freemaps
  2002-12-16  0:51       ` freemaps William Lee Irwin III
@ 2002-12-16  1:03         ` Andrew Morton
  2002-12-16  1:09           ` freemaps William Lee Irwin III
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2002-12-16  1:03 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Ingo Molnar, Frederic Rossi (LMC), linux-mm

William Lee Irwin III wrote:
> 
> On Sun, Dec 15, 2002 at 01:03:26AM -0800, Andrew Morton wrote:
> > - How does it play with non-linear mappings?
> 
> It doesn't care; they're just vma's parked on a virtual address range
> like the rest of them.
> 

But the searching needs are different.  If someone has a nonlinear mmap
of the 0-1M region of a file and then requests an mmap of the 4-5M region,
that can just be tacked onto the 0-1M mapping's vma (can't it?).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: freemaps
  2002-12-16  1:03         ` freemaps Andrew Morton
@ 2002-12-16  1:09           ` William Lee Irwin III
  0 siblings, 0 replies; 8+ messages in thread
From: William Lee Irwin III @ 2002-12-16  1:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, Frederic Rossi (LMC), linux-mm

On Sun, Dec 15, 2002 at 01:03:26AM -0800, Andrew Morton wrote:
>>> - How does it play with non-linear mappings?

William Lee Irwin III wrote:
>> It doesn't care; they're just vma's parked on a virtual address range
>> like the rest of them.

On Sun, Dec 15, 2002 at 05:03:38PM -0800, Andrew Morton wrote:
> But the searching needs are different.  If someone has a nonlinear mmap
> of the 0-1M region of a file and then requests an mmap of the 4-5M region,
> that can just be tacked onto the 0-1M mapping's vma (can't it?).

Well, that's more of a merging criterion that a search criterion. At
any rate, while it's true that they can/could be merged arbitrarily
since they're not actually associated with any particular file offset
range, there isn't any indicator I know of now that would actually
allow this distinction wrt. mergeability to be made.


Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-12-16  1:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-12-14 22:47 freemaps Frederic Rossi (LMC)
2002-12-15  3:09 ` freemaps Andrew Morton
2002-12-15  8:41   ` freemaps Ingo Molnar
2002-12-15  9:03     ` freemaps Andrew Morton
2002-12-15  9:36       ` freemaps Ingo Molnar
2002-12-16  0:51       ` freemaps William Lee Irwin III
2002-12-16  1:03         ` freemaps Andrew Morton
2002-12-16  1:09           ` freemaps William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox