[RFC] prefault optimization

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC] prefault optimization
@ 2003-08-08  0:20 Adam Litke
  2003-08-08  1:37 ` Andrew Morton
  2003-08-18 13:19 ` Mel Gorman
  0 siblings, 2 replies; 7+ messages in thread
From: Adam Litke @ 2003-08-08  0:20 UTC (permalink / raw)
  To: linux-mm; +Cc: Martin J. Bligh

This patch attempts to reduce page fault overhead for mmap'd files.  All 
pages in the page cache that will be managed by the current vma are 
instantiated in the page table.  This boots, but some applications fail 
(eg. make).  I am probably missing a corner case somewhere.  Let me know 
what you think.

--Adam Litke

diff -urN linux-2.5.73-virgin/mm/memory.c linux-2.5.73-vm/mm/memory.c
--- linux-2.5.73-virgin/mm/memory.c	2003-06-22 11:32:43.000000000 -0700
+++ linux-2.5.73-vm/mm/memory.c	2003-08-07 13:01:48.000000000 -0700
@@ -1328,6 +1328,47 @@
  	return ret;
  }

+/* Try to reduce overhead from page faults by grabbing pages from the page
+ * cache and instantiating the page table entries for this vma
+ */
+static int
+do_pre_fault(struct mm_struct *mm, struct vm_area_struct *vma, pmd_t *pmd,
+		const pte_t *page_table)
+{
+	unsigned long vm_end_pgoff, offset, address;
+	struct address_space *mapping;
+	struct page *new_page;
+	pte_t *pte, entry;
+	struct pte_chain *pte_chain;
+	
+	/* the file offset corrssponding to end of this vma */
+	vm_end_pgoff = ((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) + 
vma->vm_pgoff;
+	mapping = vma->vm_file->f_dentry->d_inode->i_mapping;
+
+	/* Itterate through all pages managed by this vma */
+	for(offset = vma->vm_pgoff; offset < vm_end_pgoff; ++offset)
+	{
+		address = vma->vm_start + ((offset - vma->vm_pgoff) << PAGE_SHIFT);
+		pte = pte_offset_map(pmd, address);
+		if(pte_none(*pte)) { /* don't touch instantiated ptes */
+			new_page = find_get_page(mapping, offset);
+			if(!new_page)
+				continue;
+			
+			/* This code taken directly from do_no_page() */
+			pte_chain = pte_chain_alloc(GFP_KERNEL);
+			++mm->rss;
+			flush_icache_page(vma, new_page);
+			entry = mk_pte(new_page, vma->vm_page_prot);
+			set_pte(pte, entry);
+			pte_chain = page_add_rmap(new_page, pte, pte_chain);
+			pte_unmap(page_table);
+			update_mmu_cache(vma, address, *pte);
+			pte_chain_free(pte_chain);
+		}
+	}
+}
+
  /*
   * do_no_page() tries to create a new page mapping. It aggressively
   * tries to share with existing pages, but makes a separate copy if
@@ -1405,6 +1446,8 @@
  		set_pte(page_table, entry);
  		pte_chain = page_add_rmap(new_page, page_table, pte_chain);
  		pte_unmap(page_table);
+		//if(!write_access)
+			do_pre_fault(mm, vma, pmd, page_table);
  	} else {
  		/* One of our sibling threads was faster, back out. */
  		pte_unmap(page_table);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] prefault optimization
  2003-08-08  0:20 [RFC] prefault optimization Adam Litke
@ 2003-08-08  1:37 ` Andrew Morton
  2003-08-14 21:45   ` Adam Litke
  2003-08-14 21:47   ` Adam Litke
  2003-08-18 13:19 ` Mel Gorman
  1 sibling, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2003-08-08  1:37 UTC (permalink / raw)
  To: Adam Litke; +Cc: linux-mm, mbligh

Adam Litke <agl@us.ibm.com> wrote:
>
> This patch attempts to reduce page fault overhead for mmap'd files.  All 
> pages in the page cache that will be managed by the current vma are 
> instantiated in the page table.  This boots, but some applications fail 
> (eg. make).  I am probably missing a corner case somewhere.  Let me know 
> what you think.

Well it's simple enough.

I'd like to see it using find_get_pages() though.

And find a way to hold the pte page's atomic kmap across the whole pte page
(or at least a find_get_pages' chunk worth) rather than dropping and
reacquiring it all the time.

Perhaps it can use install_page() as well, rather than open-coding it?


> +		pte = pte_offset_map(pmd, address);
> +		if(pte_none(*pte)) { /* don't touch instantiated ptes */
> +			new_page = find_get_page(mapping, offset);
> +			if(!new_page)
> +				continue;
> +			
> +			/* This code taken directly from do_no_page() */
> +			pte_chain = pte_chain_alloc(GFP_KERNEL);

Cannot do a sleeping allocation while holding the atomic kmap from
pte_offset_map().  

> +			++mm->rss;
> +			flush_icache_page(vma, new_page);
> +			entry = mk_pte(new_page, vma->vm_page_prot);
> +			set_pte(pte, entry);
> +			pte_chain = page_add_rmap(new_page, pte, pte_chain);
> +			pte_unmap(page_table);
> +			update_mmu_cache(vma, address, *pte);
> +			pte_chain_free(pte_chain);
> +		}

		else
			pte_unmap(pte);



And the pte_chain handling can be optimised:

	struct pte_chain *pte_chain = NULL;

	...
	for ( ... ) {
		if (pte_chain == NULL)
			pte_chain = pte_chain_alloc();
		...
		pte_chain = page_add_rmap(page, pte_chain);
	}
	...
	pte_chain_free(pte_chain);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] prefault optimization
  2003-08-08  1:37 ` Andrew Morton
@ 2003-08-14 21:45   ` Adam Litke
  2003-08-14 21:47   ` Adam Litke
  1 sibling, 0 replies; 7+ messages in thread
From: Adam Litke @ 2003-08-14 21:45 UTC (permalink / raw)
  To: linux-mm; +Cc: Andrew Morton

Here is the latest on the pre-fault code I posted last week.  I fixed it
up some in response to comments I received.  There is still a bug which
causes some programs to segfault (ie. gcc).  Interestingly, man fails
the first time it is run, but subsequent runs are successful.  Please
take a look and let me know what you think.  Any ideas about that bug?

On Thu, 2003-08-07 at 18:37, Andrew Morton wrote:
> I'd like to see it using find_get_pages() though.

This implementation is simple but somewhat wasteful.  My basic testing
shows around 30% of pages returned from find_get_pages() aren't used.
 
> And find a way to hold the pte page's atomic kmap across the whole pte page

I allocate the page once at the beginning but have to drop it when I
need to allocate a pte_chain.  Perhaps it could be done a better way.

> Perhaps it can use install_page() as well, rather than open-coding it?

It seems that install_page does too much for what I need.  For starters
it zaps the pte.  There is also no need to do the pgd lookup stuff every
time because I already know the correct pte entry to use.

> Cannot do a sleeping allocation while holding the atomic kmap from
> pte_offset_map().  

I took a dirty approach to this one.  Is it ok to hold the
page_table_lock throughout this function?
 
> And the pte_chain handling can be optimised:

I think I am pretty close here.  In my brief test 10% of mapped pages
required a call to pte_chain_alloc.

--Adam

diff -urN linux-2.5.73-virgin/include/asm-i386/pgtable.h
linux-2.5.73-vm/include/asm-i386/pgtable.h
--- linux-2.5.73-virgin/include/asm-i386/pgtable.h	2003-06-22
11:33:04.000000000 -0700
+++ linux-2.5.73-vm/include/asm-i386/pgtable.h	2003-08-13
07:08:18.000000000 -0700
@@ -299,12 +299,15 @@
 	((pte_t *)kmap_atomic(pmd_page(*(dir)),KM_PTE0) + pte_index(address))
 #define pte_offset_map_nested(dir, address) \
 	((pte_t *)kmap_atomic(pmd_page(*(dir)),KM_PTE1) + pte_index(address))
+#define pte_base_map(dir) \
+	((pte_t *)kmap_atomic(pmd_page(*(dir)),KM_PTE0)
 #define pte_unmap(pte) kunmap_atomic(pte, KM_PTE0)
 #define pte_unmap_nested(pte) kunmap_atomic(pte, KM_PTE1)
 #else
 #define pte_offset_map(dir, address) \
 	((pte_t *)page_address(pmd_page(*(dir))) + pte_index(address))
 #define pte_offset_map_nested(dir, address) pte_offset_map(dir,
address)
+#define pte_base_map(dir) ((pte_t *)page_address(pmd_page(*(dir))))
 #define pte_unmap(pte) do { } while (0)
 #define pte_unmap_nested(pte) do { } while (0)
 #endif
diff -urN linux-2.5.73-virgin/mm/memory.c linux-2.5.73-vm/mm/memory.c
--- linux-2.5.73-virgin/mm/memory.c	2003-06-22 11:32:43.000000000 -0700
+++ linux-2.5.73-vm/mm/memory.c	2003-08-14 07:57:54.000000000 -0700
@@ -1328,6 +1328,74 @@
 	return ret;
 }
 
+#define vma_nr_pages(vma) \
+	((vma->vm_end - vma->vm_start) >> PAGE_SHIFT)
+
+/* Try to reduce overhead from page faults by grabbing pages from the
page
+ * cache and instantiating the page table entries for this vma
+ */
+
+unsigned long prefault_entered = 0;
+unsigned long prefault_pages_mapped = 0;
+unsigned long prefault_pte_alloc = 0;
+unsigned long prefault_unused_pages = 0;
+
+static void
+do_pre_fault(struct mm_struct *mm, struct vm_area_struct *vma, pmd_t
*pmd)
+{
+	unsigned long offset, address;
+	struct address_space *mapping;
+	struct page *new_page;
+	pte_t *pte, *pte_base;
+	struct pte_chain *pte_chain;
+	unsigned int i, num_pages;
+	struct page **pages; 
+	
+	/* debug */ ++prefault_entered;
+	pages = kmalloc(PTRS_PER_PTE * sizeof(struct page*), GFP_KERNEL);
+	mapping = vma->vm_file->f_dentry->d_inode->i_mapping;
+	num_pages = find_get_pages(mapping, vma->vm_pgoff, PTRS_PER_PTE,
pages);
+
+	pte_chain = pte_chain_alloc(GFP_KERNEL);
+	pte_base = pte_base_map(pmd);
+
+	/* Iterate through all pages managed by this vma */
+	for (i = 0; i < num_pages; ++i)
+	{
+		new_page = pages[i];
+		if (new_page->index >= (vma->vm_pgoff + vma_nr_pages(vma)))
+			break; /* The rest of the pages are not in this vma */
+		offset = new_page->index - vma->vm_pgoff;
+		address = vma->vm_start + (offset << PAGE_SHIFT);
+		pte = pte_base + pte_index(address);
+		if (pte_none(*pte)) {
+			mm->rss++;
+			flush_icache_page(vma, new_page);
+			set_pte(pte, mk_pte(new_page, vma->vm_page_prot));
+			if (pte_chain == NULL) {
+				pte_unmap(pte_base);
+				/* debug */ ++prefault_pte_alloc;
+				pte_chain = pte_chain_alloc(GFP_KERNEL);
+				pte_base = pte_base_map(pmd);
+			}
+			pte_chain = page_add_rmap(new_page, pte, pte_chain);
+			update_mmu_cache(vma, address, *pte);
+			/* debug */ ++prefault_pages_mapped;
+			pages[i] = NULL;
+		}
+	}
+	pte_unmap(pte_base);
+	pte_chain_free(pte_chain);
+
+	/* Release the pages we did not sucessfully add */
+	for (i = 0; i < num_pages; ++i)
+		if (pages[i]) {
+			/* debug */ ++prefault_unused_pages;
+			page_cache_release(pages[i]);
+		}
+	kfree(pages);
+}
+
 /*
  * do_no_page() tries to create a new page mapping. It aggressively
  * tries to share with existing pages, but makes a separate copy if
@@ -1416,6 +1484,8 @@
 
 	/* no need to invalidate: a not-present page shouldn't be cached */
 	update_mmu_cache(vma, address, entry);
+
+	do_pre_fault(mm, vma, pmd);
 	spin_unlock(&mm->page_table_lock);
 	ret = VM_FAULT_MAJOR;
 	goto out;


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] prefault optimization
  2003-08-08  1:37 ` Andrew Morton
  2003-08-14 21:45   ` Adam Litke
@ 2003-08-14 21:47   ` Adam Litke
  1 sibling, 0 replies; 7+ messages in thread
From: Adam Litke @ 2003-08-14 21:47 UTC (permalink / raw)
  To: linux-mm; +Cc: Andrew Morton

Here is the latest on the pre-fault code I posted last week.  I fixed it
up some in response to comments I received.  There is still a bug which
causes some programs to segfault (ie. gcc).  Interestingly, man fails
the first time it is run, but subsequent runs are successful.  Please
take a look and let me know what you think.  Any ideas about that bug?

On Thu, 2003-08-07 at 18:37, Andrew Morton wrote:
> I'd like to see it using find_get_pages() though.

This implementation is simple but somewhat wasteful.  My basic testing
shows around 30% of pages returned from find_get_pages() aren't used.
 
> And find a way to hold the pte page's atomic kmap across the whole pte page

I allocate the page once at the beginning but have to drop it when I
need to allocate a pte_chain.  Perhaps it could be done a better way.

> Perhaps it can use install_page() as well, rather than open-coding it?

It seems that install_page does too much for what I need.  For starters
it zaps the pte.  There is also no need to do the pgd lookup stuff every
time because I already know the correct pte entry to use.

> Cannot do a sleeping allocation while holding the atomic kmap from
> pte_offset_map().  

I took a dirty approach to this one.  Is it ok to hold the
page_table_lock throughout this function?
 
> And the pte_chain handling can be optimised:

I think I am pretty close here.  In my brief test 10% of mapped pages
required a call to pte_chain_alloc.

--Adam

diff -urN linux-2.5.73-virgin/include/asm-i386/pgtable.h
linux-2.5.73-vm/include/asm-i386/pgtable.h
--- linux-2.5.73-virgin/include/asm-i386/pgtable.h	2003-06-22
11:33:04.000000000 -0700
+++ linux-2.5.73-vm/include/asm-i386/pgtable.h	2003-08-13
07:08:18.000000000 -0700
@@ -299,12 +299,15 @@
 	((pte_t *)kmap_atomic(pmd_page(*(dir)),KM_PTE0) + pte_index(address))
 #define pte_offset_map_nested(dir, address) \
 	((pte_t *)kmap_atomic(pmd_page(*(dir)),KM_PTE1) + pte_index(address))
+#define pte_base_map(dir) \
+	((pte_t *)kmap_atomic(pmd_page(*(dir)),KM_PTE0)
 #define pte_unmap(pte) kunmap_atomic(pte, KM_PTE0)
 #define pte_unmap_nested(pte) kunmap_atomic(pte, KM_PTE1)
 #else
 #define pte_offset_map(dir, address) \
 	((pte_t *)page_address(pmd_page(*(dir))) + pte_index(address))
 #define pte_offset_map_nested(dir, address) pte_offset_map(dir,
address)
+#define pte_base_map(dir) ((pte_t *)page_address(pmd_page(*(dir))))
 #define pte_unmap(pte) do { } while (0)
 #define pte_unmap_nested(pte) do { } while (0)
 #endif
diff -urN linux-2.5.73-virgin/mm/memory.c linux-2.5.73-vm/mm/memory.c
--- linux-2.5.73-virgin/mm/memory.c	2003-06-22 11:32:43.000000000 -0700
+++ linux-2.5.73-vm/mm/memory.c	2003-08-14 07:57:54.000000000 -0700
@@ -1328,6 +1328,74 @@
 	return ret;
 }
 
+#define vma_nr_pages(vma) \
+	((vma->vm_end - vma->vm_start) >> PAGE_SHIFT)
+
+/* Try to reduce overhead from page faults by grabbing pages from the
page
+ * cache and instantiating the page table entries for this vma
+ */
+
+unsigned long prefault_entered = 0;
+unsigned long prefault_pages_mapped = 0;
+unsigned long prefault_pte_alloc = 0;
+unsigned long prefault_unused_pages = 0;
+
+static void
+do_pre_fault(struct mm_struct *mm, struct vm_area_struct *vma, pmd_t
*pmd)
+{
+	unsigned long offset, address;
+	struct address_space *mapping;
+	struct page *new_page;
+	pte_t *pte, *pte_base;
+	struct pte_chain *pte_chain;
+	unsigned int i, num_pages;
+	struct page **pages; 
+	
+	/* debug */ ++prefault_entered;
+	pages = kmalloc(PTRS_PER_PTE * sizeof(struct page*), GFP_KERNEL);
+	mapping = vma->vm_file->f_dentry->d_inode->i_mapping;
+	num_pages = find_get_pages(mapping, vma->vm_pgoff, PTRS_PER_PTE,
pages);
+
+	pte_chain = pte_chain_alloc(GFP_KERNEL);
+	pte_base = pte_base_map(pmd);
+
+	/* Iterate through all pages managed by this vma */
+	for (i = 0; i < num_pages; ++i)
+	{
+		new_page = pages[i];
+		if (new_page->index >= (vma->vm_pgoff + vma_nr_pages(vma)))
+			break; /* The rest of the pages are not in this vma */
+		offset = new_page->index - vma->vm_pgoff;
+		address = vma->vm_start + (offset << PAGE_SHIFT);
+		pte = pte_base + pte_index(address);
+		if (pte_none(*pte)) {
+			mm->rss++;
+			flush_icache_page(vma, new_page);
+			set_pte(pte, mk_pte(new_page, vma->vm_page_prot));
+			if (pte_chain == NULL) {
+				pte_unmap(pte_base);
+				/* debug */ ++prefault_pte_alloc;
+				pte_chain = pte_chain_alloc(GFP_KERNEL);
+				pte_base = pte_base_map(pmd);
+			}
+			pte_chain = page_add_rmap(new_page, pte, pte_chain);
+			update_mmu_cache(vma, address, *pte);
+			/* debug */ ++prefault_pages_mapped;
+			pages[i] = NULL;
+		}
+	}
+	pte_unmap(pte_base);
+	pte_chain_free(pte_chain);
+
+	/* Release the pages we did not sucessfully add */
+	for (i = 0; i < num_pages; ++i)
+		if (pages[i]) {
+			/* debug */ ++prefault_unused_pages;
+			page_cache_release(pages[i]);
+		}
+	kfree(pages);
+}
+
 /*
  * do_no_page() tries to create a new page mapping. It aggressively
  * tries to share with existing pages, but makes a separate copy if
@@ -1416,6 +1484,8 @@
 
 	/* no need to invalidate: a not-present page shouldn't be cached */
 	update_mmu_cache(vma, address, entry);
+
+	do_pre_fault(mm, vma, pmd);
 	spin_unlock(&mm->page_table_lock);
 	ret = VM_FAULT_MAJOR;
 	goto out;


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] prefault optimization
  2003-08-08  0:20 [RFC] prefault optimization Adam Litke
  2003-08-08  1:37 ` Andrew Morton
@ 2003-08-18 13:19 ` Mel Gorman
  2003-08-18 17:15   ` Martin J. Bligh
  1 sibling, 1 reply; 7+ messages in thread
From: Mel Gorman @ 2003-08-18 13:19 UTC (permalink / raw)
  To: Adam Litke; +Cc: linux-mm, Martin J. Bligh

On Thu, 7 Aug 2003, Adam Litke wrote:

> This patch attempts to reduce page fault overhead for mmap'd files.  All
> pages in the page cache that will be managed by the current vma are
> instantiated in the page table.

I believe this could punish applications which use large numbers of shared
libraries, especially if only a small portion of library code is used.
Take something like konqueror which maps over 30 shared libraries. With
prefaulting, all the libraries will be fully faulted even if only a tiny
portion of some library code is used.  This, potentially, could put a lot
of unwanted pages into the page cache which will be a kick in the pants
for low-memory systems.

For example, I don't have audio enabled at all in konqueror, but with this
patch, it will fault in 77K of data for libaudio that won't be used.

Just my 2c

-- 
Mel Gorman
http://www.csn.ul.ie/~mel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] prefault optimization
  2003-08-18 13:19 ` Mel Gorman
@ 2003-08-18 17:15   ` Martin J. Bligh
  0 siblings, 0 replies; 7+ messages in thread
From: Martin J. Bligh @ 2003-08-18 17:15 UTC (permalink / raw)
  To: Mel Gorman, Adam Litke; +Cc: linux-mm

>> This patch attempts to reduce page fault overhead for mmap'd files.  All
>> pages in the page cache that will be managed by the current vma are
>> instantiated in the page table.
> 
> I believe this could punish applications which use large numbers of shared
> libraries, especially if only a small portion of library code is used.
> Take something like konqueror which maps over 30 shared libraries. With
> prefaulting, all the libraries will be fully faulted even if only a tiny
> portion of some library code is used.  This, potentially, could put a lot
> of unwanted pages into the page cache which will be a kick in the pants
> for low-memory systems.
> 
> For example, I don't have audio enabled at all in konqueror, but with this
> patch, it will fault in 77K of data for libaudio that won't be used.
> 
> Just my 2c

The patch is designed to only prefault in pages which are already in the
pagecache, so it should be pretty cheap. I'd agree that faulting in all
the pages from disk would be too expensive.

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] prefault optimization
@ 2003-08-15 16:08 Adam Litke
  0 siblings, 0 replies; 7+ messages in thread
From: Adam Litke @ 2003-08-15 16:08 UTC (permalink / raw)
  To: linux-mm; +Cc: Martin J. Bligh, Andrew Morton

Here is the latest on the pre-fault code I posted last week.  I fixed it
up some in response to comments I received.  There is still a bug which
causes some programs to segfault (ie. gcc).  Interestingly, man fails
the first time it is run, but subsequent runs are successful.  Please
take a look and let me know what you think.  Any ideas about that bug?

On Thu, 2003-08-07 at 18:37, Andrew Morton wrote:
 > I'd like to see it using find_get_pages() though.

This implementation is simple but somewhat wasteful.  My basic testing
shows around 30% of pages returned from find_get_pages() aren't used.

 > And find a way to hold the pte page's atomic kmap across the whole 
pte > page

I allocate the page once at the beginning but have to drop it when I
need to allocate a pte_chain.  Perhaps it could be done a better way.

 > Perhaps it can use install_page() as well, rather than open-coding it?

It seems that install_page does too much for what I need.  For starters
it zaps the pte.  There is also no need to do the pgd lookup stuff every
time because I already know the correct pte entry to use.

 > Cannot do a sleeping allocation while holding the atomic kmap from
 > pte_offset_map().

I took a dirty approach to this one.  Is it ok to hold the
page_table_lock throughout this function?

 > And the pte_chain handling can be optimised:

I think I am pretty close here.  In my brief test 10% of mapped pages
required a call to pte_chain_alloc.

--Adam

diff -urN linux-2.5.73-virgin/include/asm-i386/pgtable.h 
linux-2.5.73-vm/include/asm-i386/pgtable.h
--- linux-2.5.73-virgin/include/asm-i386/pgtable.h	2003-06-22 
11:33:04.000000000 -0700
+++ linux-2.5.73-vm/include/asm-i386/pgtable.h	2003-08-13 
07:08:18.000000000 -0700
@@ -299,12 +299,15 @@
  	((pte_t *)kmap_atomic(pmd_page(*(dir)),KM_PTE0) + pte_index(address))
  #define pte_offset_map_nested(dir, address) \
  	((pte_t *)kmap_atomic(pmd_page(*(dir)),KM_PTE1) + pte_index(address))
+#define pte_base_map(dir) \
+	((pte_t *)kmap_atomic(pmd_page(*(dir)),KM_PTE0)
  #define pte_unmap(pte) kunmap_atomic(pte, KM_PTE0)
  #define pte_unmap_nested(pte) kunmap_atomic(pte, KM_PTE1)
  #else
  #define pte_offset_map(dir, address) \
  	((pte_t *)page_address(pmd_page(*(dir))) + pte_index(address))
  #define pte_offset_map_nested(dir, address) pte_offset_map(dir, address)
+#define pte_base_map(dir) ((pte_t *)page_address(pmd_page(*(dir))))
  #define pte_unmap(pte) do { } while (0)
  #define pte_unmap_nested(pte) do { } while (0)
  #endif
diff -urN linux-2.5.73-virgin/mm/memory.c linux-2.5.73-vm/mm/memory.c
--- linux-2.5.73-virgin/mm/memory.c	2003-06-22 11:32:43.000000000 -0700
+++ linux-2.5.73-vm/mm/memory.c	2003-08-14 07:57:54.000000000 -0700
@@ -1328,6 +1328,74 @@
  	return ret;
  }

+#define vma_nr_pages(vma) \
+	((vma->vm_end - vma->vm_start) >> PAGE_SHIFT)
+
+/* Try to reduce overhead from page faults by grabbing pages from the page
+ * cache and instantiating the page table entries for this vma
+ */
+
+unsigned long prefault_entered = 0;
+unsigned long prefault_pages_mapped = 0;
+unsigned long prefault_pte_alloc = 0;
+unsigned long prefault_unused_pages = 0;
+
+static void
+do_pre_fault(struct mm_struct *mm, struct vm_area_struct *vma, pmd_t *pmd)
+{
+	unsigned long offset, address;
+	struct address_space *mapping;
+	struct page *new_page;
+	pte_t *pte, *pte_base;
+	struct pte_chain *pte_chain;
+	unsigned int i, num_pages;
+	struct page **pages;
+	
+	/* debug */ ++prefault_entered;
+	pages = kmalloc(PTRS_PER_PTE * sizeof(struct page*), GFP_KERNEL);
+	mapping = vma->vm_file->f_dentry->d_inode->i_mapping;
+	num_pages = find_get_pages(mapping, vma->vm_pgoff, PTRS_PER_PTE, pages);
+
+	pte_chain = pte_chain_alloc(GFP_KERNEL);
+	pte_base = pte_base_map(pmd);
+
+	/* Iterate through all pages managed by this vma */
+	for (i = 0; i < num_pages; ++i)
+	{
+		new_page = pages[i];
+		if (new_page->index >= (vma->vm_pgoff + vma_nr_pages(vma)))
+			break; /* The rest of the pages are not in this vma */
+		offset = new_page->index - vma->vm_pgoff;
+		address = vma->vm_start + (offset << PAGE_SHIFT);
+		pte = pte_base + pte_index(address);
+		if (pte_none(*pte)) {
+			mm->rss++;
+			flush_icache_page(vma, new_page);
+			set_pte(pte, mk_pte(new_page, vma->vm_page_prot));
+			if (pte_chain == NULL) {
+				pte_unmap(pte_base);
+				/* debug */ ++prefault_pte_alloc;
+				pte_chain = pte_chain_alloc(GFP_KERNEL);
+				pte_base = pte_base_map(pmd);
+			}
+			pte_chain = page_add_rmap(new_page, pte, pte_chain);
+			update_mmu_cache(vma, address, *pte);
+			/* debug */ ++prefault_pages_mapped;
+			pages[i] = NULL;
+		}
+	}
+	pte_unmap(pte_base);
+	pte_chain_free(pte_chain);
+
+	/* Release the pages we did not sucessfully add */
+	for (i = 0; i < num_pages; ++i)
+		if (pages[i]) {
+			/* debug */ ++prefault_unused_pages;
+			page_cache_release(pages[i]);
+		}
+	kfree(pages);
+}
+
  /*
   * do_no_page() tries to create a new page mapping. It aggressively
   * tries to share with existing pages, but makes a separate copy if
@@ -1416,6 +1484,8 @@

  	/* no need to invalidate: a not-present page shouldn't be cached */
  	update_mmu_cache(vma, address, entry);
+
+	do_pre_fault(mm, vma, pmd);
  	spin_unlock(&mm->page_table_lock);
  	ret = VM_FAULT_MAJOR;
  	goto out;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-08-18 17:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-08  0:20 [RFC] prefault optimization Adam Litke
2003-08-08  1:37 ` Andrew Morton
2003-08-14 21:45   ` Adam Litke
2003-08-14 21:47   ` Adam Litke
2003-08-18 13:19 ` Mel Gorman
2003-08-18 17:15   ` Martin J. Bligh
2003-08-15 16:08 Adam Litke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox