* [RFC:PATCH 00/09] VM File Tails
@ 2007-11-08 19:47 Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 01/09] Add tail to address space Dave Kleikamp
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: Dave Kleikamp @ 2007-11-08 19:47 UTC (permalink / raw)
To: linux-fsdevel, linux-mm
This is the latest version of my "VM File Tails" work. The idea is to
store tails of files that are smaller than the base page size in kmalloc'ed
memory, allowing more efficient use of memory. This is especially important
when the base page size is large, such as 64 KB on powerpc.
So far, my testing hasn't resulted in any performance gains. The workloads
prompting this work are more involved, so more testing is needed. Right
now, I don't have a case for inclusion of these patches, but there was
interest in the community, so here they are.
These patches are built against 2.6.24-rc2.
I had posted some patches earlier that were much more complex, and
introduced dummy pages into the page cache to account for the tails. I
have abandoned that approach, and have arrived at a much simpler patch set.
The idea is to attach a buffer to the address space (page->mapping) to hold
the tail. Whenever the page corresponding to the tail is requested, a new
page is allocated and the tail is unpacked to that page. At some point,
pages that are eligible to be packed are copied into kmalloced buffers and
attached to the address space. The eligible pages must be up-to-date, clean,
unmapped, not waiting for I/O, etc.
Since the last time I posted:
- I optimized generic_file_aio_read to copy data directly from the tail,
rather than unpacking the tail and copying from the page cache
- Luiz Capitulino contributed a patch to add statistics in
/sys/kernel/debug/vm_tail/
My To-Do list includes:
- Investigate more aggressive places to pack tails. It's currently only
being done in shrink_active_list()
- benchmark!
Comments are appreciated.
The patches can also be downloaded from:
ftp://kernel.org/pub/linux/kernel/people/shaggy/vm_file_tails/
Thanks,
Shaggy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC:PATCH 01/09] Add tail to address space
2007-11-08 19:47 [RFC:PATCH 00/09] VM File Tails Dave Kleikamp
@ 2007-11-08 19:47 ` Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 02/09] Core function for packing, unpacking, and freeing file tails Dave Kleikamp
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Dave Kleikamp @ 2007-11-08 19:47 UTC (permalink / raw)
To: linux-fsdevel, linux-mm
Add tail to address space
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---
fs/inode.c | 3 +++
include/linux/fs.h | 4 ++++
mm/Kconfig | 9 +++++++++
3 files changed, 16 insertions(+)
diff -Nurp linux000/fs/inode.c linux001/fs/inode.c
--- linux000/fs/inode.c 2007-11-07 08:13:54.000000000 -0600
+++ linux001/fs/inode.c 2007-11-08 10:49:46.000000000 -0600
@@ -213,6 +213,9 @@ void inode_init_once(struct inode *inode
spin_lock_init(&inode->i_data.i_mmap_lock);
INIT_LIST_HEAD(&inode->i_data.private_list);
spin_lock_init(&inode->i_data.private_lock);
+#ifdef CONFIG_VM_FILE_TAILS
+ spin_lock_init(&inode->i_data.tail_lock);
+#endif
INIT_RAW_PRIO_TREE_ROOT(&inode->i_data.i_mmap);
INIT_LIST_HEAD(&inode->i_data.i_mmap_nonlinear);
i_size_ordered_init(inode);
diff -Nurp linux000/include/linux/fs.h linux001/include/linux/fs.h
--- linux000/include/linux/fs.h 2007-11-07 08:13:59.000000000 -0600
+++ linux001/include/linux/fs.h 2007-11-08 10:49:46.000000000 -0600
@@ -511,6 +511,10 @@ struct address_space {
spinlock_t private_lock; /* for use by the address_space */
struct list_head private_list; /* ditto */
struct address_space *assoc_mapping; /* ditto */
+#ifdef CONFIG_VM_FILE_TAILS
+ void *tail; /* file tail */
+ spinlock_t tail_lock; /* protect tail */
+#endif
} __attribute__((aligned(sizeof(long))));
/*
* On most architectures that alignment is already the case; but
diff -Nurp linux000/mm/Kconfig linux001/mm/Kconfig
--- linux000/mm/Kconfig 2007-11-07 08:14:01.000000000 -0600
+++ linux001/mm/Kconfig 2007-11-08 10:49:46.000000000 -0600
@@ -194,3 +194,12 @@ config NR_QUICK
config VIRT_TO_BUS
def_bool y
depends on !ARCH_NO_VIRT_TO_BUS
+
+config VM_FILE_TAILS
+ bool "Store file tails in slab cache"
+ def_bool n
+ help
+ If the data at the end of a file, or the entire file, is small,
+ the kernel will attempt to store that data in the slab cache,
+ rather than allocate an entire page in the page cache.
+ If unsure, say N here.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC:PATCH 02/09] Core function for packing, unpacking, and freeing file tails
2007-11-08 19:47 [RFC:PATCH 00/09] VM File Tails Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 01/09] Add tail to address space Dave Kleikamp
@ 2007-11-08 19:47 ` Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 03/09] Release tail when inode is freed Dave Kleikamp
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Dave Kleikamp @ 2007-11-08 19:47 UTC (permalink / raw)
To: linux-fsdevel, linux-mm
Core function for packing, unpacking, and freeing file tails
Cleanups by "Luiz Fernando N. Capitulino" <lcapitulino@mandriva.com.br>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---
include/linux/vm_file_tail.h | 66 ++++++++++++++++
mm/Makefile | 1
mm/file_tail.c | 169 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 236 insertions(+)
diff -Nurp linux001/include/linux/vm_file_tail.h linux002/include/linux/vm_file_tail.h
--- linux001/include/linux/vm_file_tail.h 1969-12-31 18:00:00.000000000 -0600
+++ linux002/include/linux/vm_file_tail.h 2007-11-08 10:49:46.000000000 -0600
@@ -0,0 +1,66 @@
+#ifndef FILE_TAIL_H
+#define FILE_TAIL_H
+
+#include <linux/fs.h>
+#include <linux/pagemap.h>
+
+/*
+ * This file deals with storing tails of files in buffers smaller than a page.
+ *
+ * FIXME: The contents of the file could possibly go into linux/pagemap.h.
+ */
+
+#ifdef CONFIG_VM_FILE_TAILS
+
+static inline int vm_file_tail_packed(struct address_space *mapping)
+{
+ return (mapping->tail != NULL);
+}
+
+static inline unsigned long vm_file_tail_index(struct address_space *mapping)
+{
+ return (unsigned long) (i_size_read(mapping->host) >> PAGE_CACHE_SHIFT);
+}
+
+static inline int vm_file_tail_length(struct address_space *mapping)
+{
+ return (int) (i_size_read(mapping->host) & (PAGE_CACHE_SIZE - 1));
+}
+
+void __vm_file_tail_free(struct address_space *);
+
+static inline void vm_file_tail_free(struct address_space *mapping)
+{
+ if (mapping && mapping->tail)
+ __vm_file_tail_free(mapping);
+}
+
+/*
+ * vm_file_tail_pack() returns 1 on success, 0 otherwise
+ *
+ * The caller must hold a reference on the page
+ */
+int vm_file_tail_pack(struct page *);
+void vm_file_tail_unpack(struct address_space *);
+
+/*
+ * Unpack the tail if it's at the specified index
+ */
+static inline void vm_file_tail_unpack_index(struct address_space *mapping,
+ unsigned long index)
+{
+ if (mapping->tail && index == vm_file_tail_index(mapping))
+ vm_file_tail_unpack(mapping);
+}
+
+#else /* !CONFIG_VM_FILE_TAILS */
+
+#define vm_file_tail_packed(mapping) 0
+#define vm_file_tail_free(mapping) do {} while (0)
+#define vm_file_tail_pack(page) 0
+#define vm_file_tail_unpack(mapping) do {} while (0)
+#define vm_file_tail_unpack_index(mapping, index) do {} while (0)
+
+#endif /* CONFIG_VM_FILE_TAILS */
+
+#endif /* FILE_TAIL_H */
diff -Nurp linux001/mm/Makefile linux002/mm/Makefile
--- linux001/mm/Makefile 2007-11-07 08:14:01.000000000 -0600
+++ linux002/mm/Makefile 2007-11-08 10:49:46.000000000 -0600
@@ -30,4 +30,5 @@ obj-$(CONFIG_FS_XIP) += filemap_xip.o
obj-$(CONFIG_MIGRATION) += migrate.o
obj-$(CONFIG_SMP) += allocpercpu.o
obj-$(CONFIG_QUICKLIST) += quicklist.o
+obj-$(CONFIG_VM_FILE_TAILS) += file_tail.o
diff -Nurp linux001/mm/file_tail.c linux002/mm/file_tail.c
--- linux001/mm/file_tail.c 1969-12-31 18:00:00.000000000 -0600
+++ linux002/mm/file_tail.c 2007-11-08 10:49:46.000000000 -0600
@@ -0,0 +1,169 @@
+/*
+ * linux/mm/file_tail.c
+ *
+ * Copyright (C) International Business Machines Corp., 2006-2007
+ * Author: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
+ */
+
+/*
+ * VM File Tails are used to compactly store the data at the end of the
+ * file in a small SLAB-allocated buffer when the base page size is large.
+ */
+
+#include <linux/buffer_head.h>
+#include <linux/fs.h>
+#include <linux/hardirq.h>
+#include <linux/vm_file_tail.h>
+
+/*
+ * Free the file tail
+ *
+ * Don't worry about a race. It's essentially a no-op if mapping->tail
+ * is NULL.
+ */
+void __vm_file_tail_free(struct address_space *mapping)
+{
+ unsigned long flags;
+ void *tail;
+
+ spin_lock_irqsave(&mapping->tail_lock, flags);
+ tail = mapping->tail;
+ mapping->tail = NULL;
+ spin_unlock_irqrestore(&mapping->tail_lock, flags);
+ kfree(tail);
+}
+
+/*
+ * Unpack tail into page cache.
+ *
+ * The tail is never modfied, and can be safely discarded on error
+ */
+void vm_file_tail_unpack(struct address_space *mapping)
+{
+ unsigned int flags;
+ gfp_t gfp_mask;
+ pgoff_t index;
+ void *kaddr;
+ int length;
+ struct page *page;
+ void *tail;
+
+ if (!mapping->tail)
+ return;
+
+ /* Allocate page */
+
+ if (in_atomic())
+ gfp_mask = GFP_NOWAIT;
+ else
+ gfp_mask = mapping_gfp_mask(mapping);
+
+ page = __page_cache_alloc(gfp_mask);
+
+ /* Copy data from tail to new page */
+ if (page) {
+ spin_lock_irqsave(&mapping->tail_lock, flags);
+ index = vm_file_tail_index(mapping);
+ length = vm_file_tail_length(mapping);
+ tail = mapping->tail;
+ mapping->tail = NULL;
+ spin_unlock_irqrestore(&mapping->tail_lock, flags);
+
+ if (!tail) { /* someone else freed the tail */
+ page_cache_release(page);
+ return;
+ }
+
+ kaddr = kmap_atomic(page, KM_USER0);
+ memcpy(kaddr, tail, length);
+ memset(kaddr + length, 0, PAGE_CACHE_SIZE - length);
+ kunmap_atomic(kaddr, KM_USER0);
+
+ kfree(tail);
+
+ add_to_page_cache_lru(page, mapping, index, gfp_mask);
+ unlock_page(page);
+ page_cache_release(page);
+ } else
+ /* Free the tail */
+ __vm_file_tail_free(mapping);
+}
+
+static int page_not_eligible(struct page *page)
+{
+ if (!page->mapping || page->mapping->tail)
+ return 1;
+
+ if (PageDirty(page) || !PageUptodate(page) || PageWriteback(page))
+ return 1;
+
+ if ((page_count(page) > 2) || mapping_mapped(page->mapping) ||
+ PageSwapCache(page))
+ return 1;
+
+ return 0;
+}
+
+/* * Determine if the page is eligible to be packed, and if so, pack it
+ *
+ * Non-fatal if this fails. The page will remain in the page cache.
+ *
+ * Returns 1 if the page was packed, 0 otherwise
+ */
+int vm_file_tail_pack(struct page *page)
+{
+ unsigned long flags;
+ pgoff_t index;
+ void *kaddr;
+ int length, ret = 0;
+ struct address_space *mapping;
+ void *tail;
+
+ if (TestSetPageLocked(page))
+ return 0;
+
+ if (page_not_eligible(page))
+ goto out;
+
+ mapping = page->mapping;
+ index = vm_file_tail_index(mapping);
+ length = vm_file_tail_length(mapping);
+
+ if ((index != page->index) ||
+ (length > PAGE_CACHE_SIZE / 2))
+ goto out;
+
+ if (PagePrivate(page) && !try_to_release_page(page, 0))
+ goto out;
+
+ tail = kmalloc(length, GFP_NOWAIT);
+ if (!tail)
+ goto out;
+
+ kaddr = kmap_atomic(page, KM_USER0);
+ memcpy(tail, kaddr, length);
+ kunmap_atomic(kaddr, KM_USER0);
+
+ spin_lock_irqsave(&mapping->tail_lock, flags);
+
+ /* Check again under spinlock */
+ if (mapping->tail || (index != vm_file_tail_index(mapping)) ||
+ (length != vm_file_tail_length(mapping))) {
+ /* File size must have changed */
+ spin_unlock_irqrestore(&mapping->tail_lock, flags);
+ kfree(tail);
+ goto out;
+ }
+
+ mapping->tail = tail;
+
+ spin_unlock_irqrestore(&mapping->tail_lock, flags);
+
+ remove_from_page_cache(page);
+ page_cache_release(page); /* pagecache ref */
+ ret = 1;
+
+out:
+ unlock_page(page);
+ return ret;
+}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC:PATCH 03/09] Release tail when inode is freed
2007-11-08 19:47 [RFC:PATCH 00/09] VM File Tails Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 01/09] Add tail to address space Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 02/09] Core function for packing, unpacking, and freeing file tails Dave Kleikamp
@ 2007-11-08 19:47 ` Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 04/09] Unpack or remove file tail when inode is resized Dave Kleikamp
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Dave Kleikamp @ 2007-11-08 19:47 UTC (permalink / raw)
To: linux-fsdevel, linux-mm
Release tail when inode is freed
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---
fs/inode.c | 2 ++
1 file changed, 2 insertions(+)
diff -Nurp linux002/fs/inode.c linux003/fs/inode.c
--- linux002/fs/inode.c 2007-11-08 10:49:46.000000000 -0600
+++ linux003/fs/inode.c 2007-11-08 10:49:46.000000000 -0600
@@ -10,6 +10,7 @@
#include <linux/init.h>
#include <linux/quotaops.h>
#include <linux/slab.h>
+#include <linux/vm_file_tail.h>
#include <linux/writeback.h>
#include <linux/module.h>
#include <linux/backing-dev.h>
@@ -260,6 +261,7 @@ void __iget(struct inode * inode)
void clear_inode(struct inode *inode)
{
might_sleep();
+ vm_file_tail_free(inode->i_mapping);
invalidate_inode_buffers(inode);
BUG_ON(inode->i_data.nrpages);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC:PATCH 04/09] Unpack or remove file tail when inode is resized
2007-11-08 19:47 [RFC:PATCH 00/09] VM File Tails Dave Kleikamp
` (2 preceding siblings ...)
2007-11-08 19:47 ` [RFC:PATCH 03/09] Release tail when inode is freed Dave Kleikamp
@ 2007-11-08 19:47 ` Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 05/09] find_get_page() and find_lock_page() need to unpack the tail Dave Kleikamp
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Dave Kleikamp @ 2007-11-08 19:47 UTC (permalink / raw)
To: linux-fsdevel, linux-mm
Unpack or remove file tail when inode is resized
If the inode size grows, we need to unpack the tail into a page.
If the inode shrinks, such that the entire tail is beyond the end of the
file, discard the tail. If the file shrinks, but part of the tail is still
valid, just leave it.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---
include/linux/fs.h | 14 ++++++++++++++
mm/file_tail.c | 11 +++++++++++
2 files changed, 25 insertions(+)
diff -Nurp linux003/include/linux/fs.h linux004/include/linux/fs.h
--- linux003/include/linux/fs.h 2007-11-08 10:49:46.000000000 -0600
+++ linux004/include/linux/fs.h 2007-11-08 10:49:46.000000000 -0600
@@ -715,6 +715,19 @@ static inline loff_t i_size_read(const s
#endif
}
+#ifdef CONFIG_VM_FILE_TAILS
+void __vm_file_tail_unpack_on_resize(struct inode *, loff_t);
+
+static inline void vm_file_tail_unpack_on_resize(struct inode *inode,
+ loff_t size)
+{
+ if (inode->i_mapping && inode->i_mapping->tail)
+ __vm_file_tail_unpack_on_resize(inode, size);
+}
+#else
+#define vm_file_tail_unpack_on_resize(mapping, new_size) do {} while (0)
+#endif
+
/*
* NOTE: unlike i_size_read(), i_size_write() does need locking around it
* (normally i_mutex), otherwise on 32bit/SMP an update of i_size_seqcount
@@ -722,6 +735,7 @@ static inline loff_t i_size_read(const s
*/
static inline void i_size_write(struct inode *inode, loff_t i_size)
{
+ vm_file_tail_unpack_on_resize(inode, i_size);
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
write_seqcount_begin(&inode->i_size_seqcount);
inode->i_size = i_size;
diff -Nurp linux003/mm/file_tail.c linux004/mm/file_tail.c
--- linux003/mm/file_tail.c 2007-11-08 10:49:46.000000000 -0600
+++ linux004/mm/file_tail.c 2007-11-08 10:49:46.000000000 -0600
@@ -13,6 +13,7 @@
#include <linux/buffer_head.h>
#include <linux/fs.h>
#include <linux/hardirq.h>
+#include <linux/module.h>
#include <linux/vm_file_tail.h>
/*
@@ -167,3 +168,13 @@ out:
unlock_page(page);
return ret;
}
+
+void __vm_file_tail_unpack_on_resize(struct inode *inode, loff_t new_size)
+{
+ loff_t old_size = i_size_read(inode);
+ if (new_size > old_size)
+ vm_file_tail_unpack(inode->i_mapping);
+ else if (new_size >> PAGE_CACHE_SHIFT != old_size >> PAGE_CACHE_SHIFT)
+ vm_file_tail_free(inode->i_mapping);
+}
+EXPORT_SYMBOL(__vm_file_tail_unpack_on_resize);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC:PATCH 05/09] find_get_page() and find_lock_page() need to unpack the tail
2007-11-08 19:47 [RFC:PATCH 00/09] VM File Tails Dave Kleikamp
` (3 preceding siblings ...)
2007-11-08 19:47 ` [RFC:PATCH 04/09] Unpack or remove file tail when inode is resized Dave Kleikamp
@ 2007-11-08 19:47 ` Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 06/09] For readahead, leave data in tail Dave Kleikamp
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Dave Kleikamp @ 2007-11-08 19:47 UTC (permalink / raw)
To: linux-fsdevel, linux-mm
find_get_page() and find_lock_page() need to unpack the tail
If the page being sought corresponds to the tail, and the tail is packed
in the inode, the tail must be unpacked.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---
mm/filemap.c | 3 +++
1 file changed, 3 insertions(+)
diff -Nurp linux004/mm/filemap.c linux005/mm/filemap.c
--- linux004/mm/filemap.c 2007-11-07 08:14:01.000000000 -0600
+++ linux005/mm/filemap.c 2007-11-08 10:49:46.000000000 -0600
@@ -24,6 +24,7 @@
#include <linux/file.h>
#include <linux/uio.h>
#include <linux/hash.h>
+#include <linux/vm_file_tail.h>
#include <linux/writeback.h>
#include <linux/backing-dev.h>
#include <linux/pagevec.h>
@@ -600,6 +601,7 @@ struct page * find_get_page(struct addre
{
struct page *page;
+ vm_file_tail_unpack_index(mapping, offset);
read_lock_irq(&mapping->tree_lock);
page = radix_tree_lookup(&mapping->page_tree, offset);
if (page)
@@ -624,6 +626,7 @@ struct page *find_lock_page(struct addre
{
struct page *page;
+ vm_file_tail_unpack_index(mapping, offset);
repeat:
read_lock_irq(&mapping->tree_lock);
page = radix_tree_lookup(&mapping->page_tree, offset);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC:PATCH 06/09] For readahead, leave data in tail
2007-11-08 19:47 [RFC:PATCH 00/09] VM File Tails Dave Kleikamp
` (4 preceding siblings ...)
2007-11-08 19:47 ` [RFC:PATCH 05/09] find_get_page() and find_lock_page() need to unpack the tail Dave Kleikamp
@ 2007-11-08 19:47 ` Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 07/09] shrink_active_list: pack file tails rather than move to inactive list Dave Kleikamp
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Dave Kleikamp @ 2007-11-08 19:47 UTC (permalink / raw)
To: linux-fsdevel, linux-mm
For readahead, leave data in tail
Don't unpack it until it's actually read.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---
mm/readahead.c | 5 +++++
1 file changed, 5 insertions(+)
diff -Nurp linux005/mm/readahead.c linux006/mm/readahead.c
--- linux005/mm/readahead.c 2007-11-07 08:14:01.000000000 -0600
+++ linux006/mm/readahead.c 2007-11-08 10:49:46.000000000 -0600
@@ -16,6 +16,7 @@
#include <linux/task_io_accounting_ops.h>
#include <linux/pagevec.h>
#include <linux/pagemap.h>
+#include <linux/vm_file_tail.h>
void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
{
@@ -147,6 +148,10 @@ __do_page_cache_readahead(struct address
if (page_offset > end_index)
break;
+ if ((page_offset == end_index) && vm_file_tail_packed(mapping))
+ /* Tail page is already packed */
+ break;
+
rcu_read_lock();
page = radix_tree_lookup(&mapping->page_tree, page_offset);
rcu_read_unlock();
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC:PATCH 07/09] shrink_active_list: pack file tails rather than move to inactive list
2007-11-08 19:47 [RFC:PATCH 00/09] VM File Tails Dave Kleikamp
` (5 preceding siblings ...)
2007-11-08 19:47 ` [RFC:PATCH 06/09] For readahead, leave data in tail Dave Kleikamp
@ 2007-11-08 19:47 ` Dave Kleikamp
2007-11-08 19:48 ` [RFC:PATCH 08/09] generic_file_aio_read can read directly from the tail. No need to unpack Dave Kleikamp
2007-11-08 19:48 ` [RFC:PATCH 09/09] VM tail statistics support Dave Kleikamp, Luiz Fernando N. Capitulino
8 siblings, 0 replies; 10+ messages in thread
From: Dave Kleikamp @ 2007-11-08 19:47 UTC (permalink / raw)
To: linux-fsdevel, linux-mm
The big question is how aggressively we pack the tails. This looked like
an easy place to start. If a page is being moved from the active list to
the inactive list, and the tail can be safely packed, that is not mapped,
not dirty, etc., the tail is packed and the page removed from the page
cache.
Right now, pages that never get off the inactive list will not be packed.
I will be soliciting ideas for other places in the code where tails can
be packed. One of my goals is not to be too aggressive, where tails are
packed and unpacked repeatedly. I also don't want to add too much overhead,
such as an extra scan of the inactive list.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---
mm/vmscan.c | 6 ++++++
1 file changed, 6 insertions(+)
diff -Nurp linux006/mm/vmscan.c linux007/mm/vmscan.c
--- linux006/mm/vmscan.c 2007-11-07 08:14:01.000000000 -0600
+++ linux007/mm/vmscan.c 2007-11-08 10:49:46.000000000 -0600
@@ -19,6 +19,7 @@
#include <linux/pagemap.h>
#include <linux/init.h>
#include <linux/highmem.h>
+#include <linux/vm_file_tail.h>
#include <linux/vmstat.h>
#include <linux/file.h>
#include <linux/writeback.h>
@@ -1035,7 +1036,12 @@ force_reclaim_mapped:
list_add(&page->lru, &l_active);
continue;
}
+ } else if (vm_file_tail_pack(page)) {
+ ClearPageActive(page);
+ page_cache_release(page);
+ continue;
}
+
list_add(&page->lru, &l_inactive);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC:PATCH 08/09] generic_file_aio_read can read directly from the tail. No need to unpack
2007-11-08 19:47 [RFC:PATCH 00/09] VM File Tails Dave Kleikamp
` (6 preceding siblings ...)
2007-11-08 19:47 ` [RFC:PATCH 07/09] shrink_active_list: pack file tails rather than move to inactive list Dave Kleikamp
@ 2007-11-08 19:48 ` Dave Kleikamp
2007-11-08 19:48 ` [RFC:PATCH 09/09] VM tail statistics support Dave Kleikamp, Luiz Fernando N. Capitulino
8 siblings, 0 replies; 10+ messages in thread
From: Dave Kleikamp @ 2007-11-08 19:48 UTC (permalink / raw)
To: linux-fsdevel, linux-mm
generic_file_aio_read can read directly from the tail. No need to unpack
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---
include/linux/vm_file_tail.h | 13 ++++++++++
mm/file_tail.c | 54 +++++++++++++++++++++++++++++++++++++++++++
mm/filemap.c | 4 ++-
3 files changed, 70 insertions(+), 1 deletion(-)
diff -Nurp linux007/include/linux/vm_file_tail.h linux008/include/linux/vm_file_tail.h
--- linux007/include/linux/vm_file_tail.h 2007-11-08 10:49:46.000000000 -0600
+++ linux008/include/linux/vm_file_tail.h 2007-11-08 10:49:46.000000000 -0600
@@ -53,6 +53,18 @@ static inline void vm_file_tail_unpack_i
vm_file_tail_unpack(mapping);
}
+extern int __vm_file_tail_read(struct file *, loff_t *, read_descriptor_t *);
+
+static inline int vm_file_tail_read(struct file *filp, loff_t *ppos,
+ read_descriptor_t *desc)
+{
+ struct address_space *mapping = filp->f_mapping;
+ unsigned long index = *ppos >> PAGE_CACHE_SHIFT;
+
+ if (mapping->tail && index == vm_file_tail_index(mapping))
+ return __vm_file_tail_read(filp, ppos, desc);
+ return 0;
+}
#else /* !CONFIG_VM_FILE_TAILS */
#define vm_file_tail_packed(mapping) 0
@@ -60,6 +72,7 @@ static inline void vm_file_tail_unpack_i
#define vm_file_tail_pack(page) 0
#define vm_file_tail_unpack(mapping) do {} while (0)
#define vm_file_tail_unpack_index(mapping, index) do {} while (0)
+#define vm_file_tail_read(filp, ppos, desc) 0
#endif /* CONFIG_VM_FILE_TAILS */
diff -Nurp linux007/mm/file_tail.c linux008/mm/file_tail.c
--- linux007/mm/file_tail.c 2007-11-08 10:49:46.000000000 -0600
+++ linux008/mm/file_tail.c 2007-11-08 10:49:46.000000000 -0600
@@ -178,3 +178,57 @@ void __vm_file_tail_unpack_on_resize(str
vm_file_tail_free(inode->i_mapping);
}
EXPORT_SYMBOL(__vm_file_tail_unpack_on_resize);
+
+/*
+ * Copy tail data to user buffer
+ *
+ * Returns 1 on success
+ */
+int __vm_file_tail_read(struct file *filp, loff_t *ppos,
+ read_descriptor_t *desc)
+{
+ unsigned long count = desc->count;
+ unsigned int flags;
+ unsigned long left;
+ struct address_space *mapping = filp->f_mapping;
+ unsigned long offset;
+ unsigned long index = *ppos >> PAGE_CACHE_SHIFT;
+ unsigned long size;
+
+ if (fault_in_pages_writeable(desc->arg.buf, count))
+ /*
+ * Keep this simple since this path is an optimization. Let
+ * the tricky stuff get handled in the fallback path.
+ */
+ return 0;
+
+ spin_lock_irqsave(&mapping->tail_lock, flags);
+
+ offset = *ppos & ~PAGE_CACHE_MASK;
+ if (!mapping->tail || index != vm_file_tail_index(mapping) ||
+ offset >= vm_file_tail_length(mapping)) {
+ spin_unlock_irqrestore(&mapping->tail_lock, flags);
+ return 0;
+ }
+
+ size = vm_file_tail_length(mapping) - offset;
+ if (size > count)
+ size = count;
+
+ left = __copy_to_user_inatomic(desc->arg.buf,
+ (char *)mapping->tail + offset, size);
+
+ spin_unlock_irqrestore(&mapping->tail_lock, flags);
+
+ if (left) {
+ size -= left;
+ desc->error = -EFAULT;
+ }
+ desc->count = count - size;
+ desc->written += size;
+ desc->arg.buf += size;
+ *ppos += size;
+ file_accessed(filp);
+
+ return 1;
+}
diff -Nurp linux007/mm/filemap.c linux008/mm/filemap.c
--- linux007/mm/filemap.c 2007-11-08 10:49:46.000000000 -0600
+++ linux008/mm/filemap.c 2007-11-08 10:49:46.000000000 -0600
@@ -1195,7 +1195,9 @@ generic_file_aio_read(struct kiocb *iocb
if (desc.count == 0)
continue;
desc.error = 0;
- do_generic_file_read(filp,ppos,&desc,file_read_actor);
+ if (!vm_file_tail_read(filp, ppos, &desc))
+ do_generic_file_read(filp, ppos, &desc,
+ file_read_actor);
retval += desc.written;
if (desc.error) {
retval = retval ?: desc.error;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC:PATCH 09/09] VM tail statistics support
2007-11-08 19:47 [RFC:PATCH 00/09] VM File Tails Dave Kleikamp
` (7 preceding siblings ...)
2007-11-08 19:48 ` [RFC:PATCH 08/09] generic_file_aio_read can read directly from the tail. No need to unpack Dave Kleikamp
@ 2007-11-08 19:48 ` Dave Kleikamp, Luiz Fernando N. Capitulino
8 siblings, 0 replies; 10+ messages in thread
From: Dave Kleikamp, Luiz Fernando N. Capitulino @ 2007-11-08 19:48 UTC (permalink / raw)
To: linux-fsdevel, linux-mm
[PATCH]: VM tail statistics support
This patch is a hack which introduces initial statistics support
for the VM tail functionality.
It uses debugfs and does accouting of:
1. Number of times vm_file_tail_pack() have been called
2. Number of times vm_file_tail_unpack() have been called
3. Total size of file tails allocations
4. Number of file tail allocations
5. Bytes saved
Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
---
mm/file_tail.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 126 insertions(+), 1 deletion(-)
diff -Nurp linux008/mm/file_tail.c linux009/mm/file_tail.c
--- linux008/mm/file_tail.c 2007-11-08 10:49:46.000000000 -0600
+++ linux009/mm/file_tail.c 2007-11-08 10:49:46.000000000 -0600
@@ -12,10 +12,51 @@
#include <linux/buffer_head.h>
#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/debugfs.h>
#include <linux/hardirq.h>
#include <linux/module.h>
+#include <linux/spinlock.h>
#include <linux/vm_file_tail.h>
+struct {
+ struct dentry *root_dir;
+ struct dentry *nr_tails;
+ struct dentry *tail_size;
+ struct dentry *saved_bytes;
+ struct dentry *pack_called;
+ struct dentry *unpack_called;
+ struct dentry *read_called;
+} vm_tail_debugfs;
+
+struct {
+ u32 nr_tails;
+ u32 tail_size;
+ u32 saved_bytes;
+ u32 pack_called;
+ u32 unpack_called;
+ u32 read_called;
+ spinlock_t lock;
+} vm_tail_stats = { .lock = __SPIN_LOCK_UNLOCKED(lock) };
+
+static void vm_file_tail_stats_inc(int length)
+{
+ spin_lock(&vm_tail_stats.lock);
+ vm_tail_stats.nr_tails++;
+ vm_tail_stats.tail_size += length;
+ vm_tail_stats.saved_bytes += (PAGE_SIZE - length);
+ spin_unlock(&vm_tail_stats.lock);
+}
+
+static void vm_file_tail_stats_dec(int length)
+{
+ spin_lock(&vm_tail_stats.lock);
+ vm_tail_stats.nr_tails--;
+ vm_tail_stats.tail_size -= length;
+ vm_tail_stats.saved_bytes -= (PAGE_SIZE - length);
+ spin_unlock(&vm_tail_stats.lock);
+}
+
/*
* Free the file tail
*
@@ -29,7 +70,10 @@ void __vm_file_tail_free(struct address_
spin_lock_irqsave(&mapping->tail_lock, flags);
tail = mapping->tail;
- mapping->tail = NULL;
+ if (tail) {
+ vm_file_tail_stats_dec(vm_file_tail_length(mapping));
+ mapping->tail = NULL;
+ }
spin_unlock_irqrestore(&mapping->tail_lock, flags);
kfree(tail);
}
@@ -49,6 +93,8 @@ void vm_file_tail_unpack(struct address_
struct page *page;
void *tail;
+ vm_tail_stats.unpack_called++;
+
if (!mapping->tail)
return;
@@ -85,6 +131,7 @@ void vm_file_tail_unpack(struct address_
add_to_page_cache_lru(page, mapping, index, gfp_mask);
unlock_page(page);
page_cache_release(page);
+ vm_file_tail_stats_dec(length);
} else
/* Free the tail */
__vm_file_tail_free(mapping);
@@ -120,6 +167,8 @@ int vm_file_tail_pack(struct page *page)
struct address_space *mapping;
void *tail;
+ vm_tail_stats.pack_called++;
+
if (TestSetPageLocked(page))
return 0;
@@ -163,12 +212,86 @@ int vm_file_tail_pack(struct page *page)
remove_from_page_cache(page);
page_cache_release(page); /* pagecache ref */
ret = 1;
+ vm_file_tail_stats_inc(length);
out:
unlock_page(page);
return ret;
}
+static int __init create_debugfs_file(const char *name, struct dentry **dir,
+ u32 *var)
+{
+ *dir = debugfs_create_u32(name, S_IFREG|S_IRUGO,
+ vm_tail_debugfs.root_dir, var);
+ if (!*dir) {
+ printk(KERN_ERR "ERROR: vm_tail: could not create %s\n", name);
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+static int __init vm_file_tail_init(void)
+{
+ int err;
+
+ vm_tail_debugfs.root_dir = debugfs_create_dir("vm_tail", NULL);
+ if (!vm_tail_debugfs.root_dir) {
+ printk(KERN_ERR "ERROR: %s Could not create root directory\n",
+ __FUNCTION__);
+ return -ENOMEM;
+ }
+
+ err = create_debugfs_file("nr_tails", &vm_tail_debugfs.nr_tails,
+ &vm_tail_stats.nr_tails);
+ if (err)
+ goto out_err;
+
+ err = create_debugfs_file("tail_size", &vm_tail_debugfs.tail_size,
+ &vm_tail_stats.tail_size);
+ if (err)
+ goto out_err1;
+
+ err = create_debugfs_file("saved_bytes", &vm_tail_debugfs.saved_bytes,
+ &vm_tail_stats.saved_bytes);
+ if (err)
+ goto out_err2;
+
+ err = create_debugfs_file("unpack_called",
+ &vm_tail_debugfs.unpack_called,
+ &vm_tail_stats.unpack_called);
+ if (err)
+ goto out_err3;
+
+ err = create_debugfs_file("pack_called", &vm_tail_debugfs.pack_called,
+ &vm_tail_stats.pack_called);
+ if (err)
+ goto out_err4;
+
+ err = create_debugfs_file("read_called", &vm_tail_debugfs.read_called,
+ &vm_tail_stats.read_called);
+ if (err)
+ goto out_err5;
+
+ return 0;
+
+out_err5:
+ debugfs_remove(vm_tail_debugfs.pack_called);
+out_err4:
+ debugfs_remove(vm_tail_debugfs.unpack_called);
+out_err3:
+ debugfs_remove(vm_tail_debugfs.saved_bytes);
+out_err2:
+ debugfs_remove(vm_tail_debugfs.tail_size);
+out_err1:
+ debugfs_remove(vm_tail_debugfs.nr_tails);
+out_err:
+ debugfs_remove(vm_tail_debugfs.root_dir);
+ return err;
+}
+
+postcore_initcall(vm_file_tail_init);
+
void __vm_file_tail_unpack_on_resize(struct inode *inode, loff_t new_size)
{
loff_t old_size = i_size_read(inode);
@@ -211,6 +334,8 @@ int __vm_file_tail_read(struct file *fil
return 0;
}
+ vm_tail_stats.read_called++;
+
size = vm_file_tail_length(mapping) - offset;
if (size > count)
size = count;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-11-08 19:48 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-08 19:47 [RFC:PATCH 00/09] VM File Tails Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 01/09] Add tail to address space Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 02/09] Core function for packing, unpacking, and freeing file tails Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 03/09] Release tail when inode is freed Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 04/09] Unpack or remove file tail when inode is resized Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 05/09] find_get_page() and find_lock_page() need to unpack the tail Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 06/09] For readahead, leave data in tail Dave Kleikamp
2007-11-08 19:47 ` [RFC:PATCH 07/09] shrink_active_list: pack file tails rather than move to inactive list Dave Kleikamp
2007-11-08 19:48 ` [RFC:PATCH 08/09] generic_file_aio_read can read directly from the tail. No need to unpack Dave Kleikamp
2007-11-08 19:48 ` [RFC:PATCH 09/09] VM tail statistics support Dave Kleikamp, Luiz Fernando N. Capitulino
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox