linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC:PATCH 00/07] VM File Tails
@ 2007-08-29 20:53 Dave Kleikamp
  2007-08-29 20:53 ` [RFC:PATCH 01/07] Add tail to address space Dave Kleikamp
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Dave Kleikamp @ 2007-08-29 20:53 UTC (permalink / raw)
  To: linux-mm

This is a rewrite of my "VM File Tails" work.  The idea is to store tails
of files that are smaller than the base page size in kmalloc'ed memory,
allowing more efficient use of memory.  This is especially important when
the base page size is large, such as 64 KB on powerpc.

I had posted some patches earlier that were much more complex, and
introduced dummy pages into the page cache to account for the tails.  I
have abandoned that approach, and have arrived at a much simpler patch set.

The idea is to attach a buffer to the address space (page->mapping) to hold
the tail.  Whenever the page corresponding to the tail is requested, a new
page is allocated and the tail is unpacked to that page.  At some point,
pages that are eligible to be packed are copied into kmalloced buffers and
attached to the address space.  The eligible pages must be up-to-date, clean,
unmapped, not waiting for I/O, etc.

This is still pretty preliminary, and has passed basic unit testing.  i.e.
building a kernel.  :-)

My To-Do list includes:
- adding some statistics
- optimizing generic_file_aio_read to copy data directly from the tail,
  rather than unpacking the tail and copying from the page cache
- Investigate more aggressive places to pack tails.  It's currently only
  being done in shrink_active_list()
- benchmark!

Comments are appreciated.

The patches can also be downloaded from:
ftp://kernel.org/pub/linux/kernel/people/shaggy/vm_file_tails/vm_file_tails.2007-08-29.tar.gz

Thanks,
Shaggy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC:PATCH 01/07] Add tail to address space
  2007-08-29 20:53 [RFC:PATCH 00/07] VM File Tails Dave Kleikamp
@ 2007-08-29 20:53 ` Dave Kleikamp
  2007-08-29 20:53 ` [RFC:PATCH 02/07] Core function for packing, unpacking, and freeing file tails Dave Kleikamp
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Dave Kleikamp @ 2007-08-29 20:53 UTC (permalink / raw)
  To: linux-mm

Add tail to address space

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>

---

 fs/inode.c         |    3 +++
 include/linux/fs.h |    4 ++++
 mm/Kconfig         |    9 +++++++++
 3 files changed, 16 insertions(+)

diff -Nurp linux000/fs/inode.c linux001/fs/inode.c
--- linux000/fs/inode.c	2007-08-28 09:57:14.000000000 -0500
+++ linux001/fs/inode.c	2007-08-29 13:27:46.000000000 -0500
@@ -197,6 +197,9 @@ void inode_init_once(struct inode *inode
 	spin_lock_init(&inode->i_data.i_mmap_lock);
 	INIT_LIST_HEAD(&inode->i_data.private_list);
 	spin_lock_init(&inode->i_data.private_lock);
+#ifdef CONFIG_VM_FILE_TAILS
+	spin_lock_init(&inode->i_data.tail_lock);
+#endif
 	INIT_RAW_PRIO_TREE_ROOT(&inode->i_data.i_mmap);
 	INIT_LIST_HEAD(&inode->i_data.i_mmap_nonlinear);
 	spin_lock_init(&inode->i_lock);
diff -Nurp linux000/include/linux/fs.h linux001/include/linux/fs.h
--- linux000/include/linux/fs.h	2007-08-28 09:57:17.000000000 -0500
+++ linux001/include/linux/fs.h	2007-08-29 13:27:46.000000000 -0500
@@ -453,6 +453,10 @@ struct address_space {
 	spinlock_t		private_lock;	/* for use by the address_space */
 	struct list_head	private_list;	/* ditto */
 	struct address_space	*assoc_mapping;	/* ditto */
+#ifdef CONFIG_VM_FILE_TAILS
+	void			*tail;		/* file tail */
+	spinlock_t		tail_lock;	/* protect tail */
+#endif
 } __attribute__((aligned(sizeof(long))));
 	/*
 	 * On most architectures that alignment is already the case; but
diff -Nurp linux000/mm/Kconfig linux001/mm/Kconfig
--- linux000/mm/Kconfig	2007-08-28 09:57:20.000000000 -0500
+++ linux001/mm/Kconfig	2007-08-29 13:27:46.000000000 -0500
@@ -176,3 +176,12 @@ config NR_QUICK
 config VIRT_TO_BUS
 	def_bool y
 	depends on !ARCH_NO_VIRT_TO_BUS
+
+config VM_FILE_TAILS
+	bool "Store file tails in slab cache"
+	def_bool n
+	help
+	  If the data at the end of a file, or the entire file, is small,
+	  the kernel will attempt to store that data in the slab cache,
+	  rather than allocate an entire page in the page cache.
+	  If unsure, say N here.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC:PATCH 02/07] Core function for packing, unpacking, and freeing file tails
  2007-08-29 20:53 [RFC:PATCH 00/07] VM File Tails Dave Kleikamp
  2007-08-29 20:53 ` [RFC:PATCH 01/07] Add tail to address space Dave Kleikamp
@ 2007-08-29 20:53 ` Dave Kleikamp
  2007-08-29 20:53 ` [RFC:PATCH 03/07] Release tail when inode is freed Dave Kleikamp
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Dave Kleikamp @ 2007-08-29 20:53 UTC (permalink / raw)
  To: linux-mm

Core function for packing, unpacking, and freeing file tails

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>

---

 include/linux/vm_file_tail.h |   71 ++++++++++++++++++++
 mm/Makefile                  |    1 
 mm/file_tail.c               |  148 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 220 insertions(+)

diff -Nurp linux001/include/linux/vm_file_tail.h linux002/include/linux/vm_file_tail.h
--- linux001/include/linux/vm_file_tail.h	1969-12-31 18:00:00.000000000 -0600
+++ linux002/include/linux/vm_file_tail.h	2007-08-29 13:27:46.000000000 -0500
@@ -0,0 +1,71 @@
+#ifndef FILE_TAIL_H
+#define FILE_TAIL_H
+
+#include <linux/fs.h>
+#include <linux/pagemap.h>
+
+/*
+ * This file deals with storing tails of files in buffers smaller than a page.
+ *
+ * FIXME: The contents of the file could possibly go into linux/pagemap.h.
+ */
+
+#ifdef CONFIG_VM_FILE_TAILS
+
+static inline int vm_file_tail_packed(struct address_space *mapping)
+{
+	return (mapping->tail != NULL);
+}
+
+static inline unsigned long vm_file_tail_index(struct address_space *mapping)
+{
+	return (unsigned long) (i_size_read(mapping->host) >> PAGE_CACHE_SHIFT);
+}
+
+static inline int vm_file_tail_length(struct address_space *mapping)
+{
+	return (int) (i_size_read(mapping->host) & (PAGE_CACHE_SIZE - 1));
+}
+
+static inline void vm_file_tail_free(struct address_space *mapping)
+{
+	unsigned long flags;
+	void *tail;
+	if (mapping && mapping->tail) {
+		spin_lock_irqsave(&mapping->tail_lock, flags);
+		tail = mapping->tail;
+		mapping->tail = NULL;
+		spin_unlock_irqrestore(&mapping->tail_lock, flags);
+		kfree(tail);
+	}
+}
+
+/*
+ * vm_file_tail_pack() returns 1 on success, 0 otherwise
+ *
+ * The caller must hold a reference on the page
+ */
+int vm_file_tail_pack(struct page *);
+void vm_file_tail_unpack(struct address_space *);
+
+/*
+ * Unpack the tail if it's at the specified index
+ */
+static inline void vm_file_tail_unpack_index(struct address_space *mapping,
+					     unsigned long index)
+{
+	if (index == vm_file_tail_index(mapping) && mapping->tail)
+		vm_file_tail_unpack(mapping);
+}
+
+#else /* !CONFIG_VM_FILE_TAILS */
+
+#define vm_file_tail_packed(mapping) 0
+#define vm_file_tail_free(mapping) do {} while (0)
+#define vm_file_tail_pack(page) 0
+#define vm_file_tail_unpack(mapping) do {} while (0)
+#define vm_file_tail_unpack_index(mapping, index) do {} while (0)
+
+#endif /* CONFIG_VM_FILE_TAILS */
+
+#endif	/* FILE_TAIL_H */
diff -Nurp linux001/mm/Makefile linux002/mm/Makefile
--- linux001/mm/Makefile	2007-08-28 09:57:20.000000000 -0500
+++ linux002/mm/Makefile	2007-08-29 13:27:46.000000000 -0500
@@ -29,4 +29,5 @@ obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
+obj-$(CONFIG_VM_FILE_TAILS) += file_tail.o
 
diff -Nurp linux001/mm/file_tail.c linux002/mm/file_tail.c
--- linux001/mm/file_tail.c	1969-12-31 18:00:00.000000000 -0600
+++ linux002/mm/file_tail.c	2007-08-29 13:27:46.000000000 -0500
@@ -0,0 +1,148 @@
+/*
+ *	linux/mm/file_tail.c
+ *
+ * Copyright (C) International Business Machines  Corp., 2006-2007
+ * Author: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
+ */
+
+/*
+ * VM File Tails are used to compactly store the data at the end of the
+ * file in a small SLAB-allocated buffer when the base page size is large.
+ */
+
+#include <linux/buffer_head.h>
+#include <linux/fs.h>
+#include <linux/hardirq.h>
+#include <linux/vm_file_tail.h>
+
+/*
+ * Unpack tail into page cache.
+ *
+ * The tail is never modfied, and can be safely discarded on error
+ */
+void vm_file_tail_unpack(struct address_space *mapping)
+{
+	unsigned int flags;
+	gfp_t gfp_mask;
+	pgoff_t index;
+	void *kaddr;
+	int length;
+	struct page *page;
+	void *tail;
+
+	if (!mapping->tail)
+		return;
+
+	/* Allocate page */
+
+	if (in_atomic())
+		gfp_mask = GFP_NOWAIT;
+	else
+		gfp_mask = mapping_gfp_mask(mapping);
+
+	page = __page_cache_alloc(gfp_mask);
+
+	/* Copy data from tail to new page */
+	if (page) {
+		spin_lock_irqsave(&mapping->tail_lock, flags);
+		index = vm_file_tail_index(mapping);
+		length = vm_file_tail_length(mapping);
+		tail = mapping->tail;
+		mapping->tail = NULL;
+		spin_unlock_irqrestore(&mapping->tail_lock, flags);
+
+		if (!tail) {	/* someone else freed the tail */
+			page_cache_release(page);
+			return;
+		}
+
+		kaddr = kmap_atomic(page, KM_USER0);
+		memcpy(kaddr, tail, length);
+		memset(kaddr + length, 0, PAGE_CACHE_SIZE - length);
+		kunmap_atomic(kaddr, KM_USER0);
+
+		kfree(tail);
+
+		add_to_page_cache_lru(page, mapping, index, gfp_mask);
+		unlock_page(page);
+		page_cache_release(page);
+	} else
+		/* Free the tail */
+		vm_file_tail_free(mapping);
+}
+
+/* * Determine if the page is eligible to be packed, and if so, pack it
+ *
+ * Non-fatal if this fails.  The page will remain in the page cache.
+ */
+int vm_file_tail_pack(struct page *page)
+{
+	unsigned long flags;
+	pgoff_t index;
+	void *kaddr;
+	int length;
+	struct address_space *mapping;
+	void *tail;
+
+	if (TestSetPageLocked(page))
+		return 0;
+
+	mapping = page->mapping;
+
+	if (!mapping ||
+	    mapping->tail ||
+	    PageDirty(page) ||
+	    !PageUptodate(page) ||
+	    PageWriteback(page) ||
+	    (page_count(page) > 2) ||
+	    mapping_mapped(mapping) ||
+	    PageSwapCache(page)) {
+		unlock_page(page);
+		return 0;
+	}
+
+	index = vm_file_tail_index(mapping);
+	length = vm_file_tail_length(mapping);
+
+	if ((index != page->index) ||
+	    (length > PAGE_CACHE_SIZE / 2)) {
+		unlock_page(page);
+		return 0;
+	}
+
+	if (PagePrivate(page) && !try_to_release_page(page, 0)) {
+		unlock_page(page);
+		return 0;
+	}
+
+	tail = kmalloc(length, GFP_NOWAIT);
+	if (!tail) {
+		unlock_page(page);
+		return 0;
+	}
+
+	kaddr = kmap_atomic(page, KM_USER0);
+	memcpy(tail, kaddr, length);
+	kunmap_atomic(kaddr, KM_USER0);
+
+	spin_lock_irqsave(&mapping->tail_lock, flags);
+
+	/* Check again under spinlock */
+	if (mapping->tail || (index != vm_file_tail_index(mapping)) ||
+	   (length != vm_file_tail_length(mapping))) {
+		/* File size must have changed */
+		spin_unlock_irqrestore(&mapping->tail_lock, flags);
+		unlock_page(page);
+		return 0;
+	}
+
+	mapping->tail = tail;
+
+	spin_unlock_irqrestore(&mapping->tail_lock, flags);
+
+	remove_from_page_cache(page);
+	page_cache_release(page);	/* pagecache ref */
+	unlock_page(page);
+
+	return 1;
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC:PATCH 03/07] Release tail when inode is freed
  2007-08-29 20:53 [RFC:PATCH 00/07] VM File Tails Dave Kleikamp
  2007-08-29 20:53 ` [RFC:PATCH 01/07] Add tail to address space Dave Kleikamp
  2007-08-29 20:53 ` [RFC:PATCH 02/07] Core function for packing, unpacking, and freeing file tails Dave Kleikamp
@ 2007-08-29 20:53 ` Dave Kleikamp
  2007-08-29 20:53 ` [RFC:PATCH 04/07] Unpack or remove file tail when inode is resized Dave Kleikamp
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Dave Kleikamp @ 2007-08-29 20:53 UTC (permalink / raw)
  To: linux-mm

Release tail when inode is freed

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---

 fs/inode.c |    2 ++
 1 file changed, 2 insertions(+)

diff -Nurp linux002/fs/inode.c linux003/fs/inode.c
--- linux002/fs/inode.c	2007-08-29 13:27:46.000000000 -0500
+++ linux003/fs/inode.c	2007-08-29 13:27:46.000000000 -0500
@@ -10,6 +10,7 @@
 #include <linux/init.h>
 #include <linux/quotaops.h>
 #include <linux/slab.h>
+#include <linux/vm_file_tail.h>
 #include <linux/writeback.h>
 #include <linux/module.h>
 #include <linux/backing-dev.h>
@@ -245,6 +246,7 @@ void __iget(struct inode * inode)
 void clear_inode(struct inode *inode)
 {
 	might_sleep();
+	vm_file_tail_free(inode->i_mapping);
 	invalidate_inode_buffers(inode);
        
 	BUG_ON(inode->i_data.nrpages);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC:PATCH 04/07] Unpack or remove file tail when inode is resized
  2007-08-29 20:53 [RFC:PATCH 00/07] VM File Tails Dave Kleikamp
                   ` (2 preceding siblings ...)
  2007-08-29 20:53 ` [RFC:PATCH 03/07] Release tail when inode is freed Dave Kleikamp
@ 2007-08-29 20:53 ` Dave Kleikamp
  2007-08-29 20:53 ` [RFC:PATCH 05/07] find_get_page() and find_lock_page() need to unpack the tail Dave Kleikamp
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Dave Kleikamp @ 2007-08-29 20:53 UTC (permalink / raw)
  To: linux-mm

Unpack or remove file tail when inode is resized

If the inode size grows, we need to unpack the tail into a page.
If the inode shrinks, such that the entire tail is beyond the end of the
file, discard the tail.  If the file shrinks, but part of the tail is still
valid, just leave it.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---

 include/linux/fs.h |   14 ++++++++++++++
 mm/file_tail.c     |   11 +++++++++++
 2 files changed, 25 insertions(+)

diff -Nurp linux003/include/linux/fs.h linux004/include/linux/fs.h
--- linux003/include/linux/fs.h	2007-08-29 13:27:46.000000000 -0500
+++ linux004/include/linux/fs.h	2007-08-29 13:27:46.000000000 -0500
@@ -657,6 +657,19 @@ static inline loff_t i_size_read(const s
 #endif
 }
 
+#ifdef CONFIG_VM_FILE_TAILS
+void __vm_file_tail_unpack_on_resize(struct inode *, loff_t);
+
+static inline void vm_file_tail_unpack_on_resize(struct inode *inode,
+						 loff_t size)
+{
+	if (inode->i_mapping && inode->i_mapping->tail)
+		__vm_file_tail_unpack_on_resize(inode, size);
+}
+#else
+#define vm_file_tail_unpack_on_resize(mapping, new_size) do {} while (0)
+#endif
+
 /*
  * NOTE: unlike i_size_read(), i_size_write() does need locking around it
  * (normally i_mutex), otherwise on 32bit/SMP an update of i_size_seqcount
@@ -664,6 +677,7 @@ static inline loff_t i_size_read(const s
  */
 static inline void i_size_write(struct inode *inode, loff_t i_size)
 {
+	vm_file_tail_unpack_on_resize(inode, i_size);
 #if BITS_PER_LONG==32 && defined(CONFIG_SMP)
 	write_seqcount_begin(&inode->i_size_seqcount);
 	inode->i_size = i_size;
diff -Nurp linux003/mm/file_tail.c linux004/mm/file_tail.c
--- linux003/mm/file_tail.c	2007-08-29 13:27:46.000000000 -0500
+++ linux004/mm/file_tail.c	2007-08-29 13:27:46.000000000 -0500
@@ -13,6 +13,7 @@
 #include <linux/buffer_head.h>
 #include <linux/fs.h>
 #include <linux/hardirq.h>
+#include <linux/module.h>
 #include <linux/vm_file_tail.h>
 
 /*
@@ -146,3 +147,13 @@ int vm_file_tail_pack(struct page *page)
 
 	return 1;
 }
+
+void __vm_file_tail_unpack_on_resize(struct inode *inode, loff_t new_size)
+{
+	loff_t old_size = i_size_read(inode);
+	if (new_size > old_size)
+		vm_file_tail_unpack(inode->i_mapping);
+	else if (new_size >> PAGE_CACHE_SHIFT != old_size >> PAGE_CACHE_SHIFT)
+		vm_file_tail_free(inode->i_mapping);
+}
+EXPORT_SYMBOL(__vm_file_tail_unpack_on_resize);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC:PATCH 05/07] find_get_page() and find_lock_page() need to unpack the tail
  2007-08-29 20:53 [RFC:PATCH 00/07] VM File Tails Dave Kleikamp
                   ` (3 preceding siblings ...)
  2007-08-29 20:53 ` [RFC:PATCH 04/07] Unpack or remove file tail when inode is resized Dave Kleikamp
@ 2007-08-29 20:53 ` Dave Kleikamp
  2007-08-29 20:54 ` [RFC:PATCH 06/07] For readahead, leave data in tail Dave Kleikamp
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Dave Kleikamp @ 2007-08-29 20:53 UTC (permalink / raw)
  To: linux-mm

find_get_page() and find_lock_page() need to unpack the tail

If the page being sought corresponds to the tail, and the tail is packed
in the inode, the tail must be unpacked.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---

 mm/filemap.c |    3 +++
 1 file changed, 3 insertions(+)

diff -Nurp linux004/mm/filemap.c linux005/mm/filemap.c
--- linux004/mm/filemap.c	2007-08-28 09:57:20.000000000 -0500
+++ linux005/mm/filemap.c	2007-08-29 13:27:46.000000000 -0500
@@ -24,6 +24,7 @@
 #include <linux/file.h>
 #include <linux/uio.h>
 #include <linux/hash.h>
+#include <linux/vm_file_tail.h>
 #include <linux/writeback.h>
 #include <linux/pagevec.h>
 #include <linux/blkdev.h>
@@ -597,6 +598,7 @@ struct page * find_get_page(struct addre
 {
 	struct page *page;
 
+	vm_file_tail_unpack_index(mapping, offset);
 	read_lock_irq(&mapping->tree_lock);
 	page = radix_tree_lookup(&mapping->page_tree, offset);
 	if (page)
@@ -621,6 +623,7 @@ struct page *find_lock_page(struct addre
 {
 	struct page *page;
 
+	vm_file_tail_unpack_index(mapping, offset);
 	read_lock_irq(&mapping->tree_lock);
 repeat:
 	page = radix_tree_lookup(&mapping->page_tree, offset);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC:PATCH 06/07] For readahead, leave data in tail
  2007-08-29 20:53 [RFC:PATCH 00/07] VM File Tails Dave Kleikamp
                   ` (4 preceding siblings ...)
  2007-08-29 20:53 ` [RFC:PATCH 05/07] find_get_page() and find_lock_page() need to unpack the tail Dave Kleikamp
@ 2007-08-29 20:54 ` Dave Kleikamp
  2007-08-29 20:54 ` [RFC:PATCH 07/07] shrink_active_list: pack file tails rather than move to inactive list Dave Kleikamp
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Dave Kleikamp @ 2007-08-29 20:54 UTC (permalink / raw)
  To: linux-mm

For readahead, leave data in tail

Don't unpack it until it's actually read.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---

 mm/readahead.c |    5 +++++
 1 file changed, 5 insertions(+)

diff -Nurp linux005/mm/readahead.c linux006/mm/readahead.c
--- linux005/mm/readahead.c	2007-08-28 09:57:20.000000000 -0500
+++ linux006/mm/readahead.c	2007-08-29 13:27:46.000000000 -0500
@@ -15,6 +15,7 @@
 #include <linux/backing-dev.h>
 #include <linux/task_io_accounting_ops.h>
 #include <linux/pagevec.h>
+#include <linux/vm_file_tail.h>
 
 void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
 {
@@ -163,6 +164,10 @@ __do_page_cache_readahead(struct address
 		if (page_offset > end_index)
 			break;
 
+		if ((page_offset == end_index) && vm_file_tail_packed(mapping))
+			/* Tail page is already packed */
+			break;
+
 		page = radix_tree_lookup(&mapping->page_tree, page_offset);
 		if (page)
 			continue;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC:PATCH 07/07] shrink_active_list: pack file tails rather than move to inactive list
  2007-08-29 20:53 [RFC:PATCH 00/07] VM File Tails Dave Kleikamp
                   ` (5 preceding siblings ...)
  2007-08-29 20:54 ` [RFC:PATCH 06/07] For readahead, leave data in tail Dave Kleikamp
@ 2007-08-29 20:54 ` Dave Kleikamp
  2007-08-29 21:31 ` [RFC:PATCH 00/07] VM File Tails Jörn Engel
  2007-08-31 21:00 ` Luiz Fernando N. Capitulino
  8 siblings, 0 replies; 16+ messages in thread
From: Dave Kleikamp @ 2007-08-29 20:54 UTC (permalink / raw)
  To: linux-mm

The big question is how aggressively we pack the tails.  This looked like
an easy place to start.  If a page is being moved from the active list to
the inactive list, and the tail can be safely packed, that is not mapped,
not dirty, etc., the tail is packed and the page removed from the page
cache.

Right now, pages that never get off the inactive list will not be packed.

I will be soliciting ideas for other places in the code where tails can
be packed.  One of my goals is not to be too aggressive, where tails are
packed and unpacked repeatedly.  I also don't want to add too much overhead,
such as an extra scan of the inactive list.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---

 mm/vmscan.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff -Nurp linux006/mm/vmscan.c linux007/mm/vmscan.c
--- linux006/mm/vmscan.c	2007-08-28 09:57:20.000000000 -0500
+++ linux007/mm/vmscan.c	2007-08-29 13:27:46.000000000 -0500
@@ -19,6 +19,7 @@
 #include <linux/pagemap.h>
 #include <linux/init.h>
 #include <linux/highmem.h>
+#include <linux/vm_file_tail.h>
 #include <linux/vmstat.h>
 #include <linux/file.h>
 #include <linux/writeback.h>
@@ -994,7 +995,12 @@ force_reclaim_mapped:
 				list_add(&page->lru, &l_active);
 				continue;
 			}
+		} else if (vm_file_tail_pack(page)) {
+			ClearPageActive(page);
+			page_cache_release(page);
+			continue;
 		}
+
 		list_add(&page->lru, &l_inactive);
 	}
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC:PATCH 00/07] VM File Tails
  2007-08-29 20:53 [RFC:PATCH 00/07] VM File Tails Dave Kleikamp
                   ` (6 preceding siblings ...)
  2007-08-29 20:54 ` [RFC:PATCH 07/07] shrink_active_list: pack file tails rather than move to inactive list Dave Kleikamp
@ 2007-08-29 21:31 ` Jörn Engel
  2007-08-29 21:45   ` Dave Kleikamp
  2007-08-31 21:00 ` Luiz Fernando N. Capitulino
  8 siblings, 1 reply; 16+ messages in thread
From: Jörn Engel @ 2007-08-29 21:31 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: linux-mm

On Wed, 29 August 2007 16:53:25 -0400, Dave Kleikamp wrote:
>
> - benchmark!

I'd love to know how much difference this makes.  Basically four
numbers:
- number of address spaces
- bytes allocated for file tails
- number of pages allocated for non-tail storage
- number of pages allocated for tail storage

With those it should be possible to calculate how much is saved by using
tail and how much is wasted by having both tails and a page.  Putting
this in relation to the total amount of data in page cache is
interesting as well.

While not as decisive as benchmarks it may give some indication why
certain workloads benefit or suffer.

JA?rn

-- 
The rabbit runs faster than the fox, because the rabbit is rinning for
his life while the fox is only running for his dinner.
-- Aesop

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC:PATCH 00/07] VM File Tails
  2007-08-29 21:31 ` [RFC:PATCH 00/07] VM File Tails Jörn Engel
@ 2007-08-29 21:45   ` Dave Kleikamp
  2007-08-29 23:38     ` Jörn Engel
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Kleikamp @ 2007-08-29 21:45 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mm

On Wed, 2007-08-29 at 23:31 +0200, Jorn Engel wrote:
> On Wed, 29 August 2007 16:53:25 -0400, Dave Kleikamp wrote:
> >
> > - benchmark!
> 
> I'd love to know how much difference this makes.  Basically four
> numbers:
> - number of address spaces
> - bytes allocated for file tails
> - number of pages allocated for non-tail storage
> - number of pages allocated for tail storage

The last one may be tricky, since I'm allocating the tails using
kmalloc.  The data will be interspersed with other kmalloc'ed data.  We
could keep track of the bytes, and the number of tails, but we wouldn't
know exactly how the tail bytes correspond to the number of pages needed
to store them.

> With those it should be possible to calculate how much is saved by using
> tail and how much is wasted by having both tails and a page.  Putting
> this in relation to the total amount of data in page cache is
> interesting as well.
> 
> While not as decisive as benchmarks it may give some indication why
> certain workloads benefit or suffer.
> 
> Jorn
-- 
David Kleikamp
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC:PATCH 00/07] VM File Tails
  2007-08-29 21:45   ` Dave Kleikamp
@ 2007-08-29 23:38     ` Jörn Engel
  2007-08-30  2:15       ` Dave Kleikamp
  0 siblings, 1 reply; 16+ messages in thread
From: Jörn Engel @ 2007-08-29 23:38 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: Jörn Engel, linux-mm

On Wed, 29 August 2007 21:45:42 +0000, Dave Kleikamp wrote:
> On Wed, 2007-08-29 at 23:31 +0200, JA?rn Engel wrote:
> > On Wed, 29 August 2007 16:53:25 -0400, Dave Kleikamp wrote:
> > >
> > > - benchmark!
> > 
> > I'd love to know how much difference this makes.  Basically four
> > numbers:
> > - number of address spaces
> > - bytes allocated for file tails
> > - number of pages allocated for non-tail storage
> > - number of pages allocated for tail storage
> 
> The last one may be tricky, since I'm allocating the tails using
> kmalloc.  The data will be interspersed with other kmalloc'ed data.  We
> could keep track of the bytes, and the number of tails, but we wouldn't
> know exactly how the tail bytes correspond to the number of pages needed
> to store them.

Sorry, I should have been more precise.  Under some circumstances like
mmap() you have to allocate a page and copy the tail to that page.  My
last point was about the number of such pages, not the number of pages
buried in slab caches.

Iiuc your current implementation would keep the kmalloc()-allocated tail
in the address space and _additionally_ have a full page for the same
data.  So the patches aimed to save memory may actually waste memory and
depending on circumstances may waste more than they save.  Or did I
misinterpret something?

JA?rn

-- 
It is better to die of hunger having lived without grief and fear,
than to live with a troubled spirit amid abundance.
-- Epictetus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC:PATCH 00/07] VM File Tails
  2007-08-29 23:38     ` Jörn Engel
@ 2007-08-30  2:15       ` Dave Kleikamp
  2007-08-30 10:11         ` Jörn Engel
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Kleikamp @ 2007-08-30  2:15 UTC (permalink / raw)
  To: Jörn Engel; +Cc: linux-mm

On Thu, 2007-08-30 at 01:38 +0200, Jorn Engel wrote:
> On Wed, 29 August 2007 21:45:42 +0000, Dave Kleikamp wrote:
> > On Wed, 2007-08-29 at 23:31 +0200, Jorn Engel wrote:
> > > On Wed, 29 August 2007 16:53:25 -0400, Dave Kleikamp wrote:
> > > >
> > > > - benchmark!
> > > 
> > > I'd love to know how much difference this makes.  Basically four
> > > numbers:
> > > - number of address spaces
> > > - bytes allocated for file tails
> > > - number of pages allocated for non-tail storage
> > > - number of pages allocated for tail storage
> > 
> > The last one may be tricky, since I'm allocating the tails using
> > kmalloc.  The data will be interspersed with other kmalloc'ed data.  We
> > could keep track of the bytes, and the number of tails, but we wouldn't
> > know exactly how the tail bytes correspond to the number of pages needed
> > to store them.
> 
> Sorry, I should have been more precise.  Under some circumstances like
> mmap() you have to allocate a page and copy the tail to that page.  My
> last point was about the number of such pages, not the number of pages
> buried in slab caches.
> 
> Iiuc your current implementation would keep the kmalloc()-allocated tail
> in the address space and _additionally_ have a full page for the same
> data.  So the patches aimed to save memory may actually waste memory and
> depending on circumstances may waste more than they save.  Or did I
> misinterpret something?

Once the data is packed into the tail, the page is freed.  Later if the
page is needed, a new page is allocated and the tail is unpacked into
it.  Then the tail is freed (via kfree).
-- 
David Kleikamp
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC:PATCH 00/07] VM File Tails
  2007-08-30  2:15       ` Dave Kleikamp
@ 2007-08-30 10:11         ` Jörn Engel
  0 siblings, 0 replies; 16+ messages in thread
From: Jörn Engel @ 2007-08-30 10:11 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: Jörn Engel, linux-mm

On Wed, 29 August 2007 21:15:11 -0500, Dave Kleikamp wrote:
> 
> Once the data is packed into the tail, the page is freed.  Later if the
> page is needed, a new page is allocated and the tail is unpacked into
> it.  Then the tail is freed (via kfree).
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Good.  That part had evaded me.

JA?rn

-- 
There are three principal ways to lose money: wine, women, and engineers.
While the first two are more pleasant, the third is by far the more certain.
-- Baron Rothschild

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC:PATCH 00/07] VM File Tails
  2007-08-29 20:53 [RFC:PATCH 00/07] VM File Tails Dave Kleikamp
                   ` (7 preceding siblings ...)
  2007-08-29 21:31 ` [RFC:PATCH 00/07] VM File Tails Jörn Engel
@ 2007-08-31 21:00 ` Luiz Fernando N. Capitulino
  2007-08-31 21:47   ` Dave Kleikamp
  8 siblings, 1 reply; 16+ messages in thread
From: Luiz Fernando N. Capitulino @ 2007-08-31 21:00 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: linux-mm

 Hi Dave,

Em Wed, 29 Aug 2007 16:53:25 -0400
Dave Kleikamp <shaggy@linux.vnet.ibm.com> escreveu:

| This is a rewrite of my "VM File Tails" work.  The idea is to store tails
| of files that are smaller than the base page size in kmalloc'ed memory,
| allowing more efficient use of memory.  This is especially important when
| the base page size is large, such as 64 KB on powerpc.

 I've got the OOPS below while trying this series.

 It has happened while halting the machine, but before this one I've
got two hangs and a OOPS, but was unable to get the backtrace because
the serial console wasn't working properly. Now it did.

 I've tried to reproduce it many times w/o success.

[ 2618.789047] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000044
[ 2618.891267]  printing eip:
[ 2618.923588] c0157120
[ 2618.949679] *pde = 00000000
[ 2618.983052] Oops: 0000 [#1]
[ 2619.016400] SMP 
[ 2619.038542] Modules linked in: nfs lockd nfs_acl sunrpc capability commoncap af_packet ipv6 ide_cd ide_core binfmt_misd
[ 2619.546147] CPU:    0
[ 2619.546148] EIP:    0060:[<c0157120>]    Not tainted VLI
[ 2619.546149] EFLAGS: 00010286   (2.6.23-rc4-vm1 #5)
[ 2619.694076] EIP is at find_get_page+0x10/0xa0
[ 2619.746151] eax: c0368ee0   ebx: d96c0060   ecx: 00000000   edx: 0000000a
[ 2619.827340] esi: d96c0000   edi: c0368ee0   ebp: d8627e28   esp: d8627e18
[ 2619.908526] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[ 2619.978280] Process udevd (pid: 18087, ti=d8627000 task=d42f3080 task.ti=d8627000)
[ 2620.066743] Stack: 0000000a d96c0060 d96c0000 d96c0018 d8627e30 c016cbef d8627e74 c01721c5 
[ 2620.167577]        d8627f7c c0178b5f 00008001 c03c70c0 d8627ed0 00000000 d96c0060 c01753b1 
[ 2620.268412]        d96c0114 0000000a d8627e84 00000000 d96c0060 00000019 00000000 d8627e98 
[ 2620.369247] Call Trace:
[ 2620.400641]  [<c01053fa>] show_trace_log_lvl+0x1a/0x30
[ 2620.462181]  [<c01054bb>] show_stack_log_lvl+0xab/0xd0
[ 2620.523720]  [<c01056b1>] show_registers+0x1d1/0x2d0
[ 2620.583183]  [<c01058c6>] die+0x116/0x250
[ 2620.631208]  [<c011bb6b>] do_page_fault+0x28b/0x6a0
[ 2620.689631]  [<c02e80ca>] error_code+0x72/0x78
[ 2620.742854]  [<c016cbef>] lookup_swap_cache+0xf/0x30
[ 2620.802316]  [<c01721c5>] shmem_getpage+0x225/0x690
[ 2620.860736]  [<c017271a>] shmem_fault+0x7a/0xb0
[ 2620.914999]  [<c01630d5>] __do_fault+0x55/0x3a0
[ 2620.969265]  [<c0165677>] handle_mm_fault+0x107/0x740
[ 2621.029765]  [<c011bcfd>] do_page_fault+0x41d/0x6a0
[ 2621.088186]  [<c02e80ca>] error_code+0x72/0x78
[ 2621.141411]  =======================
[ 2621.184134] Code: c3 0f 0b eb fe 8d b6 00 00 00 00 8b 52 0c eb b8 8d 74 26 00 8d bc 27 00 00 00 00 55 89 e5 57 89 c7 5 
[ 2621.416574] EIP: [<c0157120>] find_get_page+0x10/0xa0 SS:ESP 0068:d8627e18
[ 2621.499188] note: udevd[18087] exited with preempt_count 1


-- 
Luiz Fernando N. Capitulino

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC:PATCH 00/07] VM File Tails
  2007-08-31 21:00 ` Luiz Fernando N. Capitulino
@ 2007-08-31 21:47   ` Dave Kleikamp
  2007-09-03 21:09     ` Luiz Fernando N. Capitulino
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Kleikamp @ 2007-08-31 21:47 UTC (permalink / raw)
  To: Luiz Fernando N. Capitulino; +Cc: linux-mm

On Fri, 2007-08-31 at 18:00 -0300, Luiz Fernando N. Capitulino wrote:
> Hi Dave,
> 
> Em Wed, 29 Aug 2007 16:53:25 -0400
> Dave Kleikamp <shaggy@linux.vnet.ibm.com> escreveu:
> 
> | This is a rewrite of my "VM File Tails" work.  The idea is to store tails
> | of files that are smaller than the base page size in kmalloc'ed memory,
> | allowing more efficient use of memory.  This is especially important when
> | the base page size is large, such as 64 KB on powerpc.
> 
>  I've got the OOPS below while trying this series.
> 
>  It has happened while halting the machine, but before this one I've
> got two hangs and a OOPS, but was unable to get the backtrace because
> the serial console wasn't working properly. Now it did.
> 
>  I've tried to reproduce it many times w/o success.
> 
> [ 2618.789047] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000044
> [ 2618.891267]  printing eip:
> [ 2618.923588] c0157120
> [ 2618.949679] *pde = 00000000
> [ 2618.983052] Oops: 0000 [#1]
> [ 2619.016400] SMP 
> [ 2619.038542] Modules linked in: nfs lockd nfs_acl sunrpc capability commoncap af_packet ipv6 ide_cd ide_core binfmt_misd
> [ 2619.546147] CPU:    0
> [ 2619.546148] EIP:    0060:[<c0157120>]    Not tainted VLI
> [ 2619.546149] EFLAGS: 00010286   (2.6.23-rc4-vm1 #5)
> [ 2619.694076] EIP is at find_get_page+0x10/0xa0
> [ 2619.746151] eax: c0368ee0   ebx: d96c0060   ecx: 00000000   edx: 0000000a
> [ 2619.827340] esi: d96c0000   edi: c0368ee0   ebp: d8627e28   esp: d8627e18
> [ 2619.908526] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> [ 2619.978280] Process udevd (pid: 18087, ti=d8627000 task=d42f3080 task.ti=d8627000)
> [ 2620.066743] Stack: 0000000a d96c0060 d96c0000 d96c0018 d8627e30 c016cbef d8627e74 c01721c5 
> [ 2620.167577]        d8627f7c c0178b5f 00008001 c03c70c0 d8627ed0 00000000 d96c0060 c01753b1 
> [ 2620.268412]        d96c0114 0000000a d8627e84 00000000 d96c0060 00000019 00000000 d8627e98 
> [ 2620.369247] Call Trace:
> [ 2620.400641]  [<c01053fa>] show_trace_log_lvl+0x1a/0x30
> [ 2620.462181]  [<c01054bb>] show_stack_log_lvl+0xab/0xd0
> [ 2620.523720]  [<c01056b1>] show_registers+0x1d1/0x2d0
> [ 2620.583183]  [<c01058c6>] die+0x116/0x250
> [ 2620.631208]  [<c011bb6b>] do_page_fault+0x28b/0x6a0
> [ 2620.689631]  [<c02e80ca>] error_code+0x72/0x78
> [ 2620.742854]  [<c016cbef>] lookup_swap_cache+0xf/0x30
> [ 2620.802316]  [<c01721c5>] shmem_getpage+0x225/0x690
> [ 2620.860736]  [<c017271a>] shmem_fault+0x7a/0xb0
> [ 2620.914999]  [<c01630d5>] __do_fault+0x55/0x3a0
> [ 2620.969265]  [<c0165677>] handle_mm_fault+0x107/0x740
> [ 2621.029765]  [<c011bcfd>] do_page_fault+0x41d/0x6a0
> [ 2621.088186]  [<c02e80ca>] error_code+0x72/0x78
> [ 2621.141411]  =======================
> [ 2621.184134] Code: c3 0f 0b eb fe 8d b6 00 00 00 00 8b 52 0c eb b8 8d 74 26 00 8d bc 27 00 00 00 00 55 89 e5 57 89 c7 5 
> [ 2621.416574] EIP: [<c0157120>] find_get_page+0x10/0xa0 SS:ESP 0068:d8627e18
> [ 2621.499188] note: udevd[18087] exited with preempt_count 1

I'm not sure exactly what's going on.  mapping->host can't be NULL, can
it?  This patch is an improvement, but I'm not sure if it will fix the
problem.  I won't have much time to look at this until next week, but
feel free to give this a try.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>

diff -Nurp linux.orig/include/linux/vm_file_tail.h linux/include/linux/vm_file_tail.h
--- linux.orig/include/linux/vm_file_tail.h	2007-08-29 13:27:46.000000000 -0500
+++ linux/include/linux/vm_file_tail.h	2007-08-31 16:25:49.000000000 -0500
@@ -54,7 +54,7 @@ void vm_file_tail_unpack(struct address_
 static inline void vm_file_tail_unpack_index(struct address_space *mapping,
 					     unsigned long index)
 {
-	if (index == vm_file_tail_index(mapping) && mapping->tail)
+	if (mapping->tail && index == vm_file_tail_index(mapping))
 		vm_file_tail_unpack(mapping);
 }
 

Thanks,
Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC:PATCH 00/07] VM File Tails
  2007-08-31 21:47   ` Dave Kleikamp
@ 2007-09-03 21:09     ` Luiz Fernando N. Capitulino
  0 siblings, 0 replies; 16+ messages in thread
From: Luiz Fernando N. Capitulino @ 2007-09-03 21:09 UTC (permalink / raw)
  To: Dave Kleikamp; +Cc: linux-mm, lcapitulino

Em Fri, 31 Aug 2007 16:47:06 -0500
Dave Kleikamp <shaggy@linux.vnet.ibm.com> escreveu:

| I'm not sure exactly what's going on.  mapping->host can't be NULL, can
| it?  This patch is an improvement, but I'm not sure if it will fix the
| problem.  I won't have much time to look at this until next week, but
| feel free to give this a try.
| 
| Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
| 
| diff -Nurp linux.orig/include/linux/vm_file_tail.h linux/include/linux/vm_file_tail.h
| --- linux.orig/include/linux/vm_file_tail.h	2007-08-29 13:27:46.000000000 -0500
| +++ linux/include/linux/vm_file_tail.h	2007-08-31 16:25:49.000000000 -0500
| @@ -54,7 +54,7 @@ void vm_file_tail_unpack(struct address_
|  static inline void vm_file_tail_unpack_index(struct address_space *mapping,
|  					     unsigned long index)
|  {
| -	if (index == vm_file_tail_index(mapping) && mapping->tail)
| +	if (mapping->tail && index == vm_file_tail_index(mapping))
|  		vm_file_tail_unpack(mapping);
|  }

 Ok, looks like it's fixed. I've ran the kernel with this patch
applied for a few hours and didn't get any problem (w/o this
patch I was getting the OOPS in a matter of minutes).

 Btw, in vm_file_tail_pack() when checking the size with the
spinlock held you have to free tail if things doesn't match
right?

 What about the following patch (only compile tested):

[PATCH]: vm_file_tail_pack() cleanup

  1. Fix a possible memory leak
  2. Add page_not_eligible()
  3. Do not duplicate exit code

---
 mm/file_tail.c |   65 +++++++++++++++++++++++++++++----------------------------
 1 file changed, 34 insertions(+), 31 deletions(-)

--- linux-2.6-vm.orig/mm/file_tail.c
+++ linux-2.6-vm/mm/file_tail.c
@@ -72,55 +72,56 @@ void vm_file_tail_unpack(struct address_
 		vm_file_tail_free(mapping);
 }
 
+static int page_not_eligible(struct page *page)
+{
+	if (!page->mapping || page->mapping->tail)
+		return 1;
+
+	if (PageDirty(page) || !PageUptodate(page) || PageWriteback(page))
+		return 1;
+
+	if ((page_count(page) > 2) || mapping_mapped(page->mapping) ||
+	    PageSwapCache(page))
+		return 1;
+
+	return 0;
+}
+
 /* * Determine if the page is eligible to be packed, and if so, pack it
  *
- * Non-fatal if this fails.  The page will remain in the page cache.
+ * Non-fatal if this fails. The page will remain in the page cache.
+ * 
+ * Returns 1 if the page was packed, 0 otherwise
  */
 int vm_file_tail_pack(struct page *page)
 {
 	unsigned long flags;
 	pgoff_t index;
 	void *kaddr;
-	int length;
+	int length, ret = 0;
 	struct address_space *mapping;
 	void *tail;
 
 	if (TestSetPageLocked(page))
 		return 0;
 
-	mapping = page->mapping;
-
-	if (!mapping ||
-	    mapping->tail ||
-	    PageDirty(page) ||
-	    !PageUptodate(page) ||
-	    PageWriteback(page) ||
-	    (page_count(page) > 2) ||
-	    mapping_mapped(mapping) ||
-	    PageSwapCache(page)) {
-		unlock_page(page);
-		return 0;
-	}
+	if (page_not_eligible(page))
+		goto out;
 
+	mapping = page->mapping;
 	index = vm_file_tail_index(mapping);
 	length = vm_file_tail_length(mapping);
 
 	if ((index != page->index) ||
-	    (length > PAGE_CACHE_SIZE / 2)) {
-		unlock_page(page);
-		return 0;
-	}
+	    (length > PAGE_CACHE_SIZE / 2))
+		goto out;
 
-	if (PagePrivate(page) && !try_to_release_page(page, 0)) {
-		unlock_page(page);
-		return 0;
-	}
+	if (PagePrivate(page) && !try_to_release_page(page, 0))
+		goto out;
 
 	tail = kmalloc(length, GFP_NOWAIT);
-	if (!tail) {
-		unlock_page(page);
-		return 0;
-	}
+	if (!tail)
+		goto out;
 
 	kaddr = kmap_atomic(page, KM_USER0);
 	memcpy(tail, kaddr, length);
@@ -133,8 +134,8 @@ int vm_file_tail_pack(struct page *page)
 	   (length != vm_file_tail_length(mapping))) {
 		/* File size must have changed */
 		spin_unlock_irqrestore(&mapping->tail_lock, flags);
-		unlock_page(page);
-		return 0;
+		kfree(tail);
+		goto out;
 	}
 
 	mapping->tail = tail;
@@ -143,9 +144,11 @@ int vm_file_tail_pack(struct page *page)
 
 	remove_from_page_cache(page);
 	page_cache_release(page);	/* pagecache ref */
-	unlock_page(page);
+	ret = 1;
 
-	return 1;
+out:	
+	unlock_page(page);
+	return ret;
 }
 
 void __vm_file_tail_unpack_on_resize(struct inode *inode, loff_t new_size)


-- 
Luiz Fernando N. Capitulino

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-09-03 21:09 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-29 20:53 [RFC:PATCH 00/07] VM File Tails Dave Kleikamp
2007-08-29 20:53 ` [RFC:PATCH 01/07] Add tail to address space Dave Kleikamp
2007-08-29 20:53 ` [RFC:PATCH 02/07] Core function for packing, unpacking, and freeing file tails Dave Kleikamp
2007-08-29 20:53 ` [RFC:PATCH 03/07] Release tail when inode is freed Dave Kleikamp
2007-08-29 20:53 ` [RFC:PATCH 04/07] Unpack or remove file tail when inode is resized Dave Kleikamp
2007-08-29 20:53 ` [RFC:PATCH 05/07] find_get_page() and find_lock_page() need to unpack the tail Dave Kleikamp
2007-08-29 20:54 ` [RFC:PATCH 06/07] For readahead, leave data in tail Dave Kleikamp
2007-08-29 20:54 ` [RFC:PATCH 07/07] shrink_active_list: pack file tails rather than move to inactive list Dave Kleikamp
2007-08-29 21:31 ` [RFC:PATCH 00/07] VM File Tails Jörn Engel
2007-08-29 21:45   ` Dave Kleikamp
2007-08-29 23:38     ` Jörn Engel
2007-08-30  2:15       ` Dave Kleikamp
2007-08-30 10:11         ` Jörn Engel
2007-08-31 21:00 ` Luiz Fernando N. Capitulino
2007-08-31 21:47   ` Dave Kleikamp
2007-09-03 21:09     ` Luiz Fernando N. Capitulino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox