* [RFC v1 1/6] gcma: introduce contiguous memory allocator
2014-11-11 15:00 [RFC v1 0/6] introduce gcma SeongJae Park
@ 2014-11-11 15:00 ` SeongJae Park
2014-11-11 15:00 ` [RFC v1 2/6] gcma: utilize reserved memory as swap cache SeongJae Park
` (5 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2014-11-11 15:00 UTC (permalink / raw)
To: akpm; +Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, SeongJae Park
This patch introduces a simple contiguous memory allocator.
It's simple bitmap allocator to manage a contiguos memory.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
include/linux/gcma.h | 26 ++++++++
mm/gcma.c | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 199 insertions(+)
create mode 100644 include/linux/gcma.h
create mode 100644 mm/gcma.c
diff --git a/include/linux/gcma.h b/include/linux/gcma.h
new file mode 100644
index 0000000..3016968
--- /dev/null
+++ b/include/linux/gcma.h
@@ -0,0 +1,26 @@
+/*
+ * gcma.h - Guaranteed Contiguous Memory Allocator
+ *
+ * GCMA aims for contiguous memory allocation with success and fast
+ * latency guarantee.
+ * It reserves large amount of memory and let it be allocated to the
+ * contiguous memory request.
+ *
+ * Copyright (C) 2014 LG Electronics Inc.,
+ * Copyright (C) 2014 Minchan Kim <minchan@kernel.org>
+ * Copyright (C) 2014 SeongJae Park <sj38.park@gmail.com>
+ */
+
+#ifndef _LINUX_GCMA_H
+#define _LINUX_GCMA_H
+
+struct gcma;
+
+int gcma_init(unsigned long start_pfn, unsigned long size,
+ struct gcma **res_gcma);
+int gcma_alloc_contig(struct gcma *gcma,
+ unsigned long start_pfn, unsigned long size);
+void gcma_free_contig(struct gcma *gcma,
+ unsigned long start_pfn, unsigned long size);
+
+#endif /* _LINUX_GCMA_H */
diff --git a/mm/gcma.c b/mm/gcma.c
new file mode 100644
index 0000000..20a8473
--- /dev/null
+++ b/mm/gcma.c
@@ -0,0 +1,173 @@
+/*
+ * gcma.c - Guaranteed Contiguous Memory Allocator
+ *
+ * GCMA aims for contiguous memory allocation with success and fast
+ * latency guarantee.
+ * It reserves large amount of memory and let it be allocated to the
+ * contiguous memory request.
+ *
+ * Copyright (C) 2014 LG Electronics Inc.,
+ * Copyright (C) 2014 Minchan Kim <minchan@kernel.org>
+ * Copyright (C) 2014 SeongJae Park <sj38.park@gmail.com>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/highmem.h>
+#include <linux/gcma.h>
+
+struct gcma {
+ spinlock_t lock;
+ unsigned long *bitmap;
+ unsigned long base_pfn, size;
+ struct list_head list;
+};
+
+struct gcma_info {
+ spinlock_t lock; /* protect list */
+ struct list_head head;
+};
+
+static struct gcma_info ginfo = {
+ .head = LIST_HEAD_INIT(ginfo.head),
+ .lock = __SPIN_LOCK_UNLOCKED(ginfo.lock),
+};
+
+/*
+ * gcma_init - initializes a contiguous memory area
+ *
+ * @start_pfn start pfn of contiguous memory area
+ * @size number of pages in the contiguous memory area
+ * @res_gcma pointer to store the created gcma region
+ *
+ * Returns 0 on success, error code on failure.
+ */
+int gcma_init(unsigned long start_pfn, unsigned long size,
+ struct gcma **res_gcma)
+{
+ int bitmap_size = BITS_TO_LONGS(size) * sizeof(long);
+ struct gcma *gcma;
+
+ gcma = kmalloc(sizeof(*gcma), GFP_KERNEL);
+ if (!gcma)
+ goto out;
+
+ gcma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+ if (!gcma->bitmap)
+ goto free_cma;
+
+ gcma->size = size;
+ gcma->base_pfn = start_pfn;
+ spin_lock_init(&gcma->lock);
+
+ spin_lock(&ginfo.lock);
+ list_add(&gcma->list, &ginfo.head);
+ spin_unlock(&ginfo.lock);
+
+ *res_gcma = gcma;
+ pr_info("initialized gcma area [%lu, %lu]\n",
+ start_pfn, start_pfn + size);
+ return 0;
+
+free_cma:
+ kfree(gcma);
+out:
+ return -ENOMEM;
+}
+
+static struct page *gcma_alloc_page(struct gcma *gcma)
+{
+ unsigned long bit;
+ unsigned long *bitmap = gcma->bitmap;
+ struct page *page = NULL;
+
+ spin_lock(&gcma->lock);
+ bit = bitmap_find_next_zero_area(bitmap, gcma->size, 0, 1, 0);
+ if (bit >= gcma->size) {
+ spin_unlock(&gcma->lock);
+ goto out;
+ }
+
+ bitmap_set(bitmap, bit, 1);
+ page = pfn_to_page(gcma->base_pfn + bit);
+ spin_unlock(&gcma->lock);
+
+out:
+ return page;
+}
+
+static void gcma_free_page(struct gcma *gcma, struct page *page)
+{
+ unsigned long pfn, offset;
+
+ pfn = page_to_pfn(page);
+
+ spin_lock(&gcma->lock);
+ offset = pfn - gcma->base_pfn;
+
+ bitmap_clear(gcma->bitmap, offset, 1);
+ spin_unlock(&gcma->lock);
+}
+
+/*
+ * gcma_alloc_contig - allocates contiguous pages
+ *
+ * @start_pfn start pfn of requiring contiguous memory area
+ * @size size of the requiring contiguous memory area
+ *
+ * Returns 0 on success, error code on failure.
+ */
+int gcma_alloc_contig(struct gcma *gcma, unsigned long start_pfn,
+ unsigned long size)
+{
+ unsigned long offset;
+
+ spin_lock(&gcma->lock);
+ offset = start_pfn - gcma->base_pfn;
+
+ if (bitmap_find_next_zero_area(gcma->bitmap, gcma->size, offset,
+ size, 0) != 0) {
+ spin_unlock(&gcma->lock);
+ pr_warn("already allocated region required: %lu, %lu",
+ start_pfn, size);
+ return -EINVAL;
+ }
+
+ bitmap_set(gcma->bitmap, offset, size);
+ spin_unlock(&gcma->lock);
+
+ return 0;
+}
+
+/*
+ * gcma_free_contig - free allocated contiguous pages
+ *
+ * @start_pfn start pfn of freeing contiguous memory area
+ * @size number of pages in freeing contiguous memory area
+ */
+void gcma_free_contig(struct gcma *gcma,
+ unsigned long start_pfn, unsigned long size)
+{
+ unsigned long offset;
+
+ spin_lock(&gcma->lock);
+ offset = start_pfn - gcma->base_pfn;
+ bitmap_clear(gcma->bitmap, offset, size);
+ spin_unlock(&gcma->lock);
+}
+
+static int __init init_gcma(void)
+{
+ pr_info("loading gcma\n");
+
+ return 0;
+}
+
+module_init(init_gcma);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Minchan Kim <minchan@kernel.org>");
+MODULE_AUTHOR("SeongJae Park <sj38.park@gmail.com>");
+MODULE_DESCRIPTION("Guaranteed Contiguous Memory Allocator");
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* [RFC v1 2/6] gcma: utilize reserved memory as swap cache
2014-11-11 15:00 [RFC v1 0/6] introduce gcma SeongJae Park
2014-11-11 15:00 ` [RFC v1 1/6] gcma: introduce contiguous memory allocator SeongJae Park
@ 2014-11-11 15:00 ` SeongJae Park
2014-11-11 15:00 ` [RFC v1 3/6] gcma: evict frontswap pages in LRU order when memory is full SeongJae Park
` (4 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2014-11-11 15:00 UTC (permalink / raw)
To: akpm; +Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, SeongJae Park
GCMA reserves an amount of memory during boot and the memory space
should be always available for guest of that area. However, the guest
doesn't need it everytime so this patch makes the reserved memory as
swap cache via write-through frontswap for memory efficiency.
If the guest declares to need it sometime, we can discard all of swap
cache because every data should be on swap disk by write-through
frontswap. It makes allocation latency for the guest really small.
The drawback of the approach is that it could degrade system performance
due to earlier swapout by reserving if the user makes GCMA area big(e.g.,
1/3 of the system memory) and swap-cache hit ratio is low.
It's a trade-off for getting guaranteed low latency contiguous memory
allocation.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
include/linux/gcma.h | 2 +-
mm/gcma.c | 330 ++++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 330 insertions(+), 2 deletions(-)
diff --git a/include/linux/gcma.h b/include/linux/gcma.h
index 3016968..d733a9b 100644
--- a/include/linux/gcma.h
+++ b/include/linux/gcma.h
@@ -4,7 +4,7 @@
* GCMA aims for contiguous memory allocation with success and fast
* latency guarantee.
* It reserves large amount of memory and let it be allocated to the
- * contiguous memory request.
+ * contiguous memory request and utilize them as swap cache.
*
* Copyright (C) 2014 LG Electronics Inc.,
* Copyright (C) 2014 Minchan Kim <minchan@kernel.org>
diff --git a/mm/gcma.c b/mm/gcma.c
index 20a8473..ddfc0d8 100644
--- a/mm/gcma.c
+++ b/mm/gcma.c
@@ -4,7 +4,7 @@
* GCMA aims for contiguous memory allocation with success and fast
* latency guarantee.
* It reserves large amount of memory and let it be allocated to the
- * contiguous memory request.
+ * contiguous memory request and utilize as swap cache using frontswap.
*
* Copyright (C) 2014 LG Electronics Inc.,
* Copyright (C) 2014 Minchan Kim <minchan@kernel.org>
@@ -15,6 +15,7 @@
#include <linux/module.h>
#include <linux/slab.h>
+#include <linux/frontswap.h>
#include <linux/highmem.h>
#include <linux/gcma.h>
@@ -35,6 +36,42 @@ static struct gcma_info ginfo = {
.lock = __SPIN_LOCK_UNLOCKED(ginfo.lock),
};
+struct swap_slot_entry {
+ struct gcma *gcma;
+ struct rb_node rbnode;
+ pgoff_t offset;
+ struct page *page;
+ atomic_t refcount;
+};
+
+struct frontswap_tree {
+ struct rb_root rbroot;
+ spinlock_t lock;
+};
+
+static struct frontswap_tree *gcma_swap_trees[MAX_SWAPFILES];
+static struct kmem_cache *swap_slot_entry_cache;
+
+static struct frontswap_tree *swap_tree(struct page *page)
+{
+ return (struct frontswap_tree *)page->mapping;
+}
+
+static void set_swap_tree(struct page *page, struct frontswap_tree *tree)
+{
+ page->mapping = (struct address_space *)tree;
+}
+
+static struct swap_slot_entry *swap_slot(struct page *page)
+{
+ return (struct swap_slot_entry *)page->index;
+}
+
+static void set_swap_slot(struct page *page, struct swap_slot_entry *slot)
+{
+ page->index = (pgoff_t)slot;
+}
+
/*
* gcma_init - initializes a contiguous memory area
*
@@ -112,6 +149,286 @@ static void gcma_free_page(struct gcma *gcma, struct page *page)
}
/*
+ * In the case that a entry with the same offset is found, a pointer to
+ * the existing entry is stored in dupentry and the function returns -EEXIST.
+ */
+static int frontswap_rb_insert(struct rb_root *root,
+ struct swap_slot_entry *entry,
+ struct swap_slot_entry **dupentry)
+{
+ struct rb_node **link = &root->rb_node, *parent = NULL;
+ struct swap_slot_entry *myentry;
+
+ while (*link) {
+ parent = *link;
+ myentry = rb_entry(parent, struct swap_slot_entry, rbnode);
+ if (myentry->offset > entry->offset)
+ link = &(*link)->rb_left;
+ else if (myentry->offset < entry->offset)
+ link = &(*link)->rb_right;
+ else {
+ *dupentry = myentry;
+ return -EEXIST;
+ }
+ }
+ rb_link_node(&entry->rbnode, parent, link);
+ rb_insert_color(&entry->rbnode, root);
+ return 0;
+}
+
+static void frontswap_rb_erase(struct rb_root *root,
+ struct swap_slot_entry *entry)
+{
+ if (!RB_EMPTY_NODE(&entry->rbnode)) {
+ rb_erase(&entry->rbnode, root);
+ RB_CLEAR_NODE(&entry->rbnode);
+ }
+}
+
+static struct swap_slot_entry *frontswap_rb_search(struct rb_root *root,
+ pgoff_t offset)
+{
+ struct rb_node *node = root->rb_node;
+ struct swap_slot_entry *entry;
+
+ while (node) {
+ entry = rb_entry(node, struct swap_slot_entry, rbnode);
+ if (entry->offset > offset)
+ node = node->rb_left;
+ else if (entry->offset < offset)
+ node = node->rb_right;
+ else
+ return entry;
+ }
+ return NULL;
+}
+
+/* Allocates a page from gcma areas using round-robin way */
+static struct page *frontswap_alloc_page(struct gcma **res_gcma)
+{
+ struct page *page;
+ struct gcma *gcma;
+
+ spin_lock(&ginfo.lock);
+ gcma = list_first_entry(&ginfo.head, struct gcma, list);
+ list_move_tail(&gcma->list, &ginfo.head);
+
+ list_for_each_entry(gcma, &ginfo.head, list) {
+ page = gcma_alloc_page(gcma);
+ if (page) {
+ *res_gcma = gcma;
+ goto out;
+ }
+ }
+
+out:
+ spin_unlock(&ginfo.lock);
+ *res_gcma = gcma;
+ return page;
+}
+
+static void frontswap_free_entry(struct swap_slot_entry *entry)
+{
+ gcma_free_page(entry->gcma, entry->page);
+ kmem_cache_free(swap_slot_entry_cache, entry);
+}
+
+/* Caller should hold frontswap tree spinlock */
+static void swap_slot_entry_get(struct swap_slot_entry *entry)
+{
+ atomic_inc(&entry->refcount);
+}
+
+/*
+ * Caller should hold frontswap tree spinlock.
+ * Remove from the tree and free it, if nobody reference the entry.
+ */
+static void swap_slot_entry_put(struct frontswap_tree *tree,
+ struct swap_slot_entry *entry)
+{
+ int refcount = atomic_dec_return(&entry->refcount);
+
+ BUG_ON(refcount < 0);
+
+ if (refcount == 0) {
+ frontswap_rb_erase(&tree->rbroot, entry);
+ frontswap_free_entry(entry);
+ }
+}
+
+/* Caller should hold frontswap tree spinlock */
+static struct swap_slot_entry *frontswap_find_get(struct frontswap_tree *tree,
+ pgoff_t offset)
+{
+ struct swap_slot_entry *entry;
+ struct rb_root *root = &tree->rbroot;
+
+ assert_spin_locked(&tree->lock);
+ entry = frontswap_rb_search(root, offset);
+ if (entry)
+ swap_slot_entry_get(entry);
+
+ return entry;
+}
+
+void gcma_frontswap_init(unsigned type)
+{
+ struct frontswap_tree *tree;
+
+ tree = kzalloc(sizeof(struct frontswap_tree), GFP_KERNEL);
+ if (!tree) {
+ pr_warn("front swap tree for type %d failed to alloc\n", type);
+ return;
+ }
+
+ tree->rbroot = RB_ROOT;
+ spin_lock_init(&tree->lock);
+ gcma_swap_trees[type] = tree;
+}
+
+int gcma_frontswap_store(unsigned type, pgoff_t offset,
+ struct page *page)
+{
+ struct swap_slot_entry *entry, *dupentry;
+ struct gcma *gcma;
+ struct page *gcma_page = NULL;
+ struct frontswap_tree *tree = gcma_swap_trees[type];
+ u8 *src, *dst;
+ int ret;
+
+ if (!tree) {
+ WARN(1, "frontswap tree for type %d is not exist\n",
+ type);
+ return -ENODEV;
+ }
+
+ gcma_page = frontswap_alloc_page(&gcma);
+ if (!gcma_page)
+ return -ENOMEM;
+
+ entry = kmem_cache_alloc(swap_slot_entry_cache, GFP_NOIO);
+ if (!entry) {
+ gcma_free_page(gcma, gcma_page);
+ return -ENOMEM;
+ }
+
+ entry->gcma = gcma;
+ entry->page = gcma_page;
+ entry->offset = offset;
+ atomic_set(&entry->refcount, 1);
+ RB_CLEAR_NODE(&entry->rbnode);
+
+ set_swap_tree(gcma_page, tree);
+ set_swap_slot(gcma_page, entry);
+
+ /* copy from orig data to gcma-page */
+ src = kmap_atomic(page);
+ dst = kmap_atomic(gcma_page);
+ memcpy(dst, src, PAGE_SIZE);
+ kunmap_atomic(src);
+ kunmap_atomic(dst);
+
+ spin_lock(&tree->lock);
+ do {
+ /*
+ * Though this duplication scenario may happen rarely by
+ * race of swap layer, we handle this case here rather
+ * than fix swap layer because handling the possibility of
+ * duplicates is part of the tmem ABI.
+ */
+ ret = frontswap_rb_insert(&tree->rbroot, entry, &dupentry);
+ if (ret == -EEXIST) {
+ frontswap_rb_erase(&tree->rbroot, dupentry);
+ swap_slot_entry_put(tree, dupentry);
+ }
+ } while (ret == -EEXIST);
+ spin_unlock(&tree->lock);
+
+ return ret;
+}
+
+/*
+ * Returns 0 if success,
+ * Returns non-zero if failed.
+ */
+int gcma_frontswap_load(unsigned type, pgoff_t offset,
+ struct page *page)
+{
+ struct frontswap_tree *tree = gcma_swap_trees[type];
+ struct swap_slot_entry *entry;
+ struct page *gcma_page;
+ u8 *src, *dst;
+
+ if (!tree) {
+ WARN(1, "tree for type %d not exist\n", type);
+ return -1;
+ }
+
+ spin_lock(&tree->lock);
+ entry = frontswap_find_get(tree, offset);
+ spin_unlock(&tree->lock);
+ if (!entry)
+ return -1;
+
+ gcma_page = entry->page;
+ src = kmap_atomic(gcma_page);
+ dst = kmap_atomic(page);
+ memcpy(dst, src, PAGE_SIZE);
+ kunmap_atomic(src);
+ kunmap_atomic(dst);
+
+ spin_lock(&tree->lock);
+ swap_slot_entry_put(tree, entry);
+ spin_unlock(&tree->lock);
+
+ return 0;
+}
+
+void gcma_frontswap_invalidate_page(unsigned type, pgoff_t offset)
+{
+ struct frontswap_tree *tree = gcma_swap_trees[type];
+ struct swap_slot_entry *entry;
+
+ spin_lock(&tree->lock);
+ entry = frontswap_rb_search(&tree->rbroot, offset);
+ if (!entry) {
+ spin_unlock(&tree->lock);
+ return;
+ }
+
+ swap_slot_entry_put(tree, entry);
+ spin_unlock(&tree->lock);
+}
+
+void gcma_frontswap_invalidate_area(unsigned type)
+{
+ struct frontswap_tree *tree = gcma_swap_trees[type];
+ struct swap_slot_entry *entry, *n;
+
+ if (!tree)
+ return;
+
+ spin_lock(&tree->lock);
+ rbtree_postorder_for_each_entry_safe(entry, n, &tree->rbroot, rbnode) {
+ frontswap_rb_erase(&tree->rbroot, entry);
+ swap_slot_entry_put(tree, entry);
+ }
+ tree->rbroot = RB_ROOT;
+ spin_unlock(&tree->lock);
+
+ kfree(tree);
+ gcma_swap_trees[type] = NULL;
+}
+
+static struct frontswap_ops gcma_frontswap_ops = {
+ .init = gcma_frontswap_init,
+ .store = gcma_frontswap_store,
+ .load = gcma_frontswap_load,
+ .invalidate_page = gcma_frontswap_invalidate_page,
+ .invalidate_area = gcma_frontswap_invalidate_area
+};
+
+/*
* gcma_alloc_contig - allocates contiguous pages
*
* @start_pfn start pfn of requiring contiguous memory area
@@ -162,6 +479,17 @@ static int __init init_gcma(void)
{
pr_info("loading gcma\n");
+ swap_slot_entry_cache = KMEM_CACHE(swap_slot_entry, 0);
+ if (swap_slot_entry_cache == NULL)
+ return -ENOMEM;
+
+ /*
+ * By writethough mode, GCMA could discard all of pages in an instant
+ * instead of slow writing pages out to the swap device.
+ */
+ frontswap_writethrough(true);
+ frontswap_register_ops(&gcma_frontswap_ops);
+
return 0;
}
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* [RFC v1 3/6] gcma: evict frontswap pages in LRU order when memory is full
2014-11-11 15:00 [RFC v1 0/6] introduce gcma SeongJae Park
2014-11-11 15:00 ` [RFC v1 1/6] gcma: introduce contiguous memory allocator SeongJae Park
2014-11-11 15:00 ` [RFC v1 2/6] gcma: utilize reserved memory as swap cache SeongJae Park
@ 2014-11-11 15:00 ` SeongJae Park
2014-11-11 15:00 ` [RFC v1 4/6] gcma: discard swap cache pages to meet successful GCMA allocation SeongJae Park
` (3 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2014-11-11 15:00 UTC (permalink / raw)
To: akpm; +Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, SeongJae Park
GCMA uses free pages of the reserved space as swap cache so sometime we
ends up shortage of free space as time goes by and we should drain some
pages of swap cache for keeping new swapout pages in cache.
For it, GCMA manages swap cache in LRU order so we can keep active pages
in memory if possible. It could make swap-cache hit ratio high rather
than random evicting.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
mm/gcma.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 88 insertions(+), 5 deletions(-)
diff --git a/mm/gcma.c b/mm/gcma.c
index ddfc0d8..d459116 100644
--- a/mm/gcma.c
+++ b/mm/gcma.c
@@ -19,6 +19,9 @@
#include <linux/highmem.h>
#include <linux/gcma.h>
+/* XXX: What's the ideal? */
+#define NR_EVICT_BATCH 32
+
struct gcma {
spinlock_t lock;
unsigned long *bitmap;
@@ -49,9 +52,13 @@ struct frontswap_tree {
spinlock_t lock;
};
+static LIST_HEAD(slru_list); /* LRU list of swap cache */
+static spinlock_t slru_lock; /* protect slru_list */
static struct frontswap_tree *gcma_swap_trees[MAX_SWAPFILES];
static struct kmem_cache *swap_slot_entry_cache;
+static unsigned long evict_frontswap_pages(unsigned long nr_pages);
+
static struct frontswap_tree *swap_tree(struct page *page)
{
return (struct frontswap_tree *)page->mapping;
@@ -209,6 +216,7 @@ static struct page *frontswap_alloc_page(struct gcma **res_gcma)
struct page *page;
struct gcma *gcma;
+retry:
spin_lock(&ginfo.lock);
gcma = list_first_entry(&ginfo.head, struct gcma, list);
list_move_tail(&gcma->list, &ginfo.head);
@@ -216,13 +224,18 @@ static struct page *frontswap_alloc_page(struct gcma **res_gcma)
list_for_each_entry(gcma, &ginfo.head, list) {
page = gcma_alloc_page(gcma);
if (page) {
- *res_gcma = gcma;
- goto out;
+ spin_unlock(&ginfo.lock);
+ goto got;
}
}
-
-out:
spin_unlock(&ginfo.lock);
+
+ /* Failed to alloc a page from entire gcma. Evict adequate LRU
+ * frontswap slots and try allocation again */
+ if (evict_frontswap_pages(NR_EVICT_BATCH))
+ goto retry;
+
+got:
*res_gcma = gcma;
return page;
}
@@ -240,7 +253,7 @@ static void swap_slot_entry_get(struct swap_slot_entry *entry)
}
/*
- * Caller should hold frontswap tree spinlock.
+ * Caller should hold frontswap tree spinlock and slru_lock.
* Remove from the tree and free it, if nobody reference the entry.
*/
static void swap_slot_entry_put(struct frontswap_tree *tree,
@@ -251,11 +264,67 @@ static void swap_slot_entry_put(struct frontswap_tree *tree,
BUG_ON(refcount < 0);
if (refcount == 0) {
+ struct page *page = entry->page;
+
frontswap_rb_erase(&tree->rbroot, entry);
+ list_del(&page->lru);
+
frontswap_free_entry(entry);
}
}
+/*
+ * evict_frontswap_pages - evict @nr_pages LRU frontswap backed pages
+ *
+ * @nr_pages number of LRU pages to be evicted
+ *
+ * Returns number of successfully evicted pages
+ */
+static unsigned long evict_frontswap_pages(unsigned long nr_pages)
+{
+ struct frontswap_tree *tree;
+ struct swap_slot_entry *entry;
+ struct page *page, *n;
+ unsigned long evicted = 0;
+ LIST_HEAD(free_pages);
+
+ spin_lock(&slru_lock);
+ list_for_each_entry_safe_reverse(page, n, &slru_list, lru) {
+ entry = swap_slot(page);
+
+ /*
+ * the entry could be free by other thread in the while.
+ * check whether the situation occurred and avoid others to
+ * free it by compare reference count and increase it
+ * atomically.
+ */
+ if (!atomic_inc_not_zero(&entry->refcount))
+ continue;
+
+ list_move(&page->lru, &free_pages);
+ if (++evicted >= nr_pages)
+ break;
+ }
+ spin_unlock(&slru_lock);
+
+ list_for_each_entry_safe(page, n, &free_pages, lru) {
+ tree = swap_tree(page);
+ entry = swap_slot(page);
+
+ spin_lock(&tree->lock);
+ spin_lock(&slru_lock);
+ /* drop refcount increased by above loop */
+ swap_slot_entry_put(tree, entry);
+ /* free entry if the entry is still in tree */
+ if (frontswap_rb_search(&tree->rbroot, entry->offset))
+ swap_slot_entry_put(tree, entry);
+ spin_unlock(&slru_lock);
+ spin_unlock(&tree->lock);
+ }
+
+ return evicted;
+}
+
/* Caller should hold frontswap tree spinlock */
static struct swap_slot_entry *frontswap_find_get(struct frontswap_tree *tree,
pgoff_t offset)
@@ -339,9 +408,15 @@ int gcma_frontswap_store(unsigned type, pgoff_t offset,
ret = frontswap_rb_insert(&tree->rbroot, entry, &dupentry);
if (ret == -EEXIST) {
frontswap_rb_erase(&tree->rbroot, dupentry);
+ spin_lock(&slru_lock);
swap_slot_entry_put(tree, dupentry);
+ spin_unlock(&slru_lock);
}
} while (ret == -EEXIST);
+
+ spin_lock(&slru_lock);
+ list_add(&gcma_page->lru, &slru_list);
+ spin_unlock(&slru_lock);
spin_unlock(&tree->lock);
return ret;
@@ -378,7 +453,10 @@ int gcma_frontswap_load(unsigned type, pgoff_t offset,
kunmap_atomic(dst);
spin_lock(&tree->lock);
+ spin_lock(&slru_lock);
+ list_move(&gcma_page->lru, &slru_list);
swap_slot_entry_put(tree, entry);
+ spin_unlock(&slru_lock);
spin_unlock(&tree->lock);
return 0;
@@ -396,7 +474,9 @@ void gcma_frontswap_invalidate_page(unsigned type, pgoff_t offset)
return;
}
+ spin_lock(&slru_lock);
swap_slot_entry_put(tree, entry);
+ spin_unlock(&slru_lock);
spin_unlock(&tree->lock);
}
@@ -411,7 +491,9 @@ void gcma_frontswap_invalidate_area(unsigned type)
spin_lock(&tree->lock);
rbtree_postorder_for_each_entry_safe(entry, n, &tree->rbroot, rbnode) {
frontswap_rb_erase(&tree->rbroot, entry);
+ spin_lock(&slru_lock);
swap_slot_entry_put(tree, entry);
+ spin_unlock(&slru_lock);
}
tree->rbroot = RB_ROOT;
spin_unlock(&tree->lock);
@@ -479,6 +561,7 @@ static int __init init_gcma(void)
{
pr_info("loading gcma\n");
+ spin_lock_init(&slru_lock);
swap_slot_entry_cache = KMEM_CACHE(swap_slot_entry, 0);
if (swap_slot_entry_cache == NULL)
return -ENOMEM;
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* [RFC v1 4/6] gcma: discard swap cache pages to meet successful GCMA allocation
2014-11-11 15:00 [RFC v1 0/6] introduce gcma SeongJae Park
` (2 preceding siblings ...)
2014-11-11 15:00 ` [RFC v1 3/6] gcma: evict frontswap pages in LRU order when memory is full SeongJae Park
@ 2014-11-11 15:00 ` SeongJae Park
2014-11-11 15:00 ` [RFC v1 5/6] gcma: export statistical data on debugfs SeongJae Park
` (2 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2014-11-11 15:00 UTC (permalink / raw)
To: akpm; +Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, SeongJae Park
GCMA's goal is to allocate contiguous memory successfully anytime
as well as efficient usage of reserved memory space.
For memory efficiency, we allowed using reserved space as swap cache
so we should be able to drain those swap cache pages when GCMA user
want to get contiguos memory successfully, anytime.
We just discard swap caches pages if needed.
It's okay because we have used write-through mode of frontswap so
all of data should be on disk already.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
mm/gcma.c | 192 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 181 insertions(+), 11 deletions(-)
diff --git a/mm/gcma.c b/mm/gcma.c
index d459116..9c07128 100644
--- a/mm/gcma.c
+++ b/mm/gcma.c
@@ -80,6 +80,50 @@ static void set_swap_slot(struct page *page, struct swap_slot_entry *slot)
}
/*
+ * Flags for status of a page in gcma
+ *
+ * GF_SWAP_LRU
+ * The page is being used for frontswap and hang on frontswap LRU list.
+ * It can be drained for contiguous memory allocation anytime.
+ * Protected by slru_lock.
+ *
+ * GF_RECLAIMING
+ * The page is being draining for contiguous memory allocation.
+ * Frontswap guests should not use it.
+ * Protected by slru_lock.
+ *
+ * GF_ISOLATED
+ * The page is isolated for contiguous memory allocation.
+ * GCMA guests can use the page safely while frontswap guests should not.
+ * Protected by gcma->lock.
+ */
+enum gpage_flags {
+ GF_SWAP_LRU = 0x1,
+ GF_RECLAIMING = 0x2,
+ GF_ISOLATED = 0x4,
+};
+
+static int gpage_flag(struct page *page, int flag)
+{
+ return page->private & flag;
+}
+
+static void set_gpage_flag(struct page *page, int flag)
+{
+ page->private |= flag;
+}
+
+static void clear_gpage_flag(struct page *page, int flag)
+{
+ page->private &= ~flag;
+}
+
+static void clear_gpage_flagall(struct page *page)
+{
+ page->private = 0;
+}
+
+/*
* gcma_init - initializes a contiguous memory area
*
* @start_pfn start pfn of contiguous memory area
@@ -137,11 +181,13 @@ static struct page *gcma_alloc_page(struct gcma *gcma)
bitmap_set(bitmap, bit, 1);
page = pfn_to_page(gcma->base_pfn + bit);
spin_unlock(&gcma->lock);
+ clear_gpage_flagall(page);
out:
return page;
}
+/* Caller should hold slru_lock */
static void gcma_free_page(struct gcma *gcma, struct page *page)
{
unsigned long pfn, offset;
@@ -151,7 +197,18 @@ static void gcma_free_page(struct gcma *gcma, struct page *page)
spin_lock(&gcma->lock);
offset = pfn - gcma->base_pfn;
- bitmap_clear(gcma->bitmap, offset, 1);
+ if (likely(!gpage_flag(page, GF_RECLAIMING))) {
+ bitmap_clear(gcma->bitmap, offset, 1);
+ } else {
+ /*
+ * The page should be safe to be used for a thread which
+ * reclaimed the page.
+ * To prevent further allocation from other thread,
+ * set bitmap and mark the page as isolated.
+ */
+ bitmap_set(gcma->bitmap, offset, 1);
+ set_gpage_flag(page, GF_ISOLATED);
+ }
spin_unlock(&gcma->lock);
}
@@ -301,6 +358,7 @@ static unsigned long evict_frontswap_pages(unsigned long nr_pages)
if (!atomic_inc_not_zero(&entry->refcount))
continue;
+ clear_gpage_flag(page, GF_SWAP_LRU);
list_move(&page->lru, &free_pages);
if (++evicted >= nr_pages)
break;
@@ -377,7 +435,9 @@ int gcma_frontswap_store(unsigned type, pgoff_t offset,
entry = kmem_cache_alloc(swap_slot_entry_cache, GFP_NOIO);
if (!entry) {
+ spin_lock(&slru_lock);
gcma_free_page(gcma, gcma_page);
+ spin_unlock(&slru_lock);
return -ENOMEM;
}
@@ -415,6 +475,7 @@ int gcma_frontswap_store(unsigned type, pgoff_t offset,
} while (ret == -EEXIST);
spin_lock(&slru_lock);
+ set_gpage_flag(gcma_page, GF_SWAP_LRU);
list_add(&gcma_page->lru, &slru_list);
spin_unlock(&slru_lock);
spin_unlock(&tree->lock);
@@ -454,7 +515,8 @@ int gcma_frontswap_load(unsigned type, pgoff_t offset,
spin_lock(&tree->lock);
spin_lock(&slru_lock);
- list_move(&gcma_page->lru, &slru_list);
+ if (likely(gpage_flag(gcma_page, GF_SWAP_LRU)))
+ list_move(&gcma_page->lru, &slru_list);
swap_slot_entry_put(tree, entry);
spin_unlock(&slru_lock);
spin_unlock(&tree->lock);
@@ -511,6 +573,43 @@ static struct frontswap_ops gcma_frontswap_ops = {
};
/*
+ * Return 0 if [start_pfn, end_pfn] is isolated.
+ * Otherwise, return first unisolated pfn from the start_pfn.
+ */
+static unsigned long isolate_interrupted(struct gcma *gcma,
+ unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long offset;
+ unsigned long *bitmap;
+ unsigned long pfn, ret = 0;
+ struct page *page;
+
+ spin_lock(&gcma->lock);
+
+ for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+ int set;
+
+ offset = pfn - gcma->base_pfn;
+ bitmap = gcma->bitmap + offset / BITS_PER_LONG;
+
+ set = test_bit(pfn % BITS_PER_LONG, bitmap);
+ if (!set) {
+ ret = pfn;
+ break;
+ }
+
+ page = pfn_to_page(pfn);
+ if (!gpage_flag(page, GF_ISOLATED)) {
+ ret = pfn;
+ break;
+ }
+
+ }
+ spin_unlock(&gcma->lock);
+ return ret;
+}
+
+/*
* gcma_alloc_contig - allocates contiguous pages
*
* @start_pfn start pfn of requiring contiguous memory area
@@ -521,21 +620,92 @@ static struct frontswap_ops gcma_frontswap_ops = {
int gcma_alloc_contig(struct gcma *gcma, unsigned long start_pfn,
unsigned long size)
{
+ LIST_HEAD(free_pages);
+ struct page *page, *n;
+ struct swap_slot_entry *entry;
unsigned long offset;
+ unsigned long *bitmap;
+ struct frontswap_tree *tree;
+ unsigned long pfn;
+ unsigned long orig_start = start_pfn;
- spin_lock(&gcma->lock);
- offset = start_pfn - gcma->base_pfn;
+retry:
+ for (pfn = start_pfn; pfn < start_pfn + size; pfn++) {
+ spin_lock(&gcma->lock);
+
+ offset = pfn - gcma->base_pfn;
+ bitmap = gcma->bitmap + offset / BITS_PER_LONG;
+ page = pfn_to_page(pfn);
+
+ if (!test_bit(offset % BITS_PER_LONG, bitmap)) {
+ /* set a bit for prevent allocation for frontswap */
+ bitmap_set(gcma->bitmap, offset, 1);
+ set_gpage_flag(page, GF_ISOLATED);
+ spin_unlock(&gcma->lock);
+ continue;
+ }
+
+ /* Someone is using the page so it's complicated :( */
+ spin_unlock(&gcma->lock);
+ spin_lock(&slru_lock);
+ /*
+ * If the page is in LRU, we can get swap_slot_entry from
+ * the page with no problem.
+ */
+ if (gpage_flag(page, GF_SWAP_LRU)) {
+ BUG_ON(gpage_flag(page, GF_RECLAIMING));
+
+ entry = swap_slot(page);
+ if (atomic_inc_not_zero(&entry->refcount)) {
+ clear_gpage_flag(page, GF_SWAP_LRU);
+ set_gpage_flag(page, GF_RECLAIMING);
+ list_move(&page->lru, &free_pages);
+ spin_unlock(&slru_lock);
+ continue;
+ }
+ }
- if (bitmap_find_next_zero_area(gcma->bitmap, gcma->size, offset,
- size, 0) != 0) {
+ /*
+ * Someone is allocating the page but it's not yet in LRU
+ * in case of frontswap_store or it was deleted from LRU
+ * but not yet from gcma's bitmap in case of
+ * frontswap_invalidate. Anycase, the race is small so retry
+ * after a while will see success. Below isolate_interrupted
+ * handles it.
+ */
+ spin_lock(&gcma->lock);
+ if (!test_bit(offset % BITS_PER_LONG, bitmap)) {
+ bitmap_set(gcma->bitmap, offset, 1);
+ set_gpage_flag(page, GF_ISOLATED);
+ } else {
+ set_gpage_flag(page, GF_RECLAIMING);
+ }
spin_unlock(&gcma->lock);
- pr_warn("already allocated region required: %lu, %lu",
- start_pfn, size);
- return -EINVAL;
+ spin_unlock(&slru_lock);
}
- bitmap_set(gcma->bitmap, offset, size);
- spin_unlock(&gcma->lock);
+ /*
+ * Since we increased refcount of the page above, we can access
+ * swap_slot_entry with safe
+ */
+ list_for_each_entry_safe(page, n, &free_pages, lru) {
+ tree = swap_tree(page);
+ entry = swap_slot(page);
+
+ spin_lock(&tree->lock);
+ spin_lock(&slru_lock);
+ /* drop refcount increased by above loop */
+ swap_slot_entry_put(tree, entry);
+ /* free entry if the entry is still in tree */
+ if (frontswap_rb_search(&tree->rbroot, entry->offset))
+ swap_slot_entry_put(tree, entry);
+ spin_unlock(&slru_lock);
+ spin_unlock(&tree->lock);
+ }
+
+ start_pfn = isolate_interrupted(gcma, orig_start, orig_start + size);
+ if (start_pfn)
+ goto retry;
return 0;
}
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* [RFC v1 5/6] gcma: export statistical data on debugfs
2014-11-11 15:00 [RFC v1 0/6] introduce gcma SeongJae Park
` (3 preceding siblings ...)
2014-11-11 15:00 ` [RFC v1 4/6] gcma: discard swap cache pages to meet successful GCMA allocation SeongJae Park
@ 2014-11-11 15:00 ` SeongJae Park
2014-11-11 15:00 ` [RFC v1 6/6] gcma: integrate gcma under cma interface SeongJae Park
2014-11-11 18:57 ` [RFC v1 0/6] introduce gcma Christoph Lameter
6 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2014-11-11 15:00 UTC (permalink / raw)
To: akpm; +Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, SeongJae Park
Export saved / loaded / evicted / reclaimed pages from gcma's frontswap
backend on debugfs to let users know how gcma is working internally.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
mm/gcma.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/mm/gcma.c b/mm/gcma.c
index 9c07128..65395ec 100644
--- a/mm/gcma.c
+++ b/mm/gcma.c
@@ -57,6 +57,12 @@ static spinlock_t slru_lock; /* protect slru_list */
static struct frontswap_tree *gcma_swap_trees[MAX_SWAPFILES];
static struct kmem_cache *swap_slot_entry_cache;
+/* For statistics */
+static atomic_t gcma_stored_pages = ATOMIC_INIT(0);
+static atomic_t gcma_loaded_pages = ATOMIC_INIT(0);
+static atomic_t gcma_evicted_pages = ATOMIC_INIT(0);
+static atomic_t gcma_reclaimed_pages = ATOMIC_INIT(0);
+
static unsigned long evict_frontswap_pages(unsigned long nr_pages);
static struct frontswap_tree *swap_tree(struct page *page)
@@ -380,6 +386,7 @@ static unsigned long evict_frontswap_pages(unsigned long nr_pages)
spin_unlock(&tree->lock);
}
+ atomic_add(evicted, &gcma_evicted_pages);
return evicted;
}
@@ -480,6 +487,7 @@ int gcma_frontswap_store(unsigned type, pgoff_t offset,
spin_unlock(&slru_lock);
spin_unlock(&tree->lock);
+ atomic_inc(&gcma_stored_pages);
return ret;
}
@@ -521,6 +529,7 @@ int gcma_frontswap_load(unsigned type, pgoff_t offset,
spin_unlock(&slru_lock);
spin_unlock(&tree->lock);
+ atomic_inc(&gcma_loaded_pages);
return 0;
}
@@ -659,6 +668,7 @@ retry:
if (atomic_inc_not_zero(&entry->refcount)) {
clear_gpage_flag(page, GF_SWAP_LRU);
set_gpage_flag(page, GF_RECLAIMING);
+ atomic_inc(&gcma_reclaimed_pages);
list_move(&page->lru, &free_pages);
spin_unlock(&slru_lock);
continue;
@@ -679,6 +689,7 @@ retry:
set_gpage_flag(page, GF_ISOLATED);
} else {
set_gpage_flag(page, GF_RECLAIMING);
+ atomic_inc(&gcma_reclaimed_pages);
}
spin_unlock(&gcma->lock);
spin_unlock(&slru_lock);
@@ -727,6 +738,40 @@ void gcma_free_contig(struct gcma *gcma,
spin_unlock(&gcma->lock);
}
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+
+static struct dentry *gcma_debugfs_root;
+
+static int __init gcma_debugfs_init(void)
+{
+ if (!debugfs_initialized())
+ return -ENODEV;
+
+ gcma_debugfs_root = debugfs_create_dir("gcma", NULL);
+ if (!gcma_debugfs_root)
+ return -ENOMEM;
+
+ debugfs_create_atomic_t("stored_pages", S_IRUGO,
+ gcma_debugfs_root, &gcma_stored_pages);
+ debugfs_create_atomic_t("loaded_pages", S_IRUGO,
+ gcma_debugfs_root, &gcma_loaded_pages);
+ debugfs_create_atomic_t("evicted_pages", S_IRUGO,
+ gcma_debugfs_root, &gcma_evicted_pages);
+ debugfs_create_atomic_t("reclaimed_pages", S_IRUGO,
+ gcma_debugfs_root, &gcma_reclaimed_pages);
+
+ pr_info("gcma debufs init\n");
+ return 0;
+}
+#else
+static int __init gcma_debugfs_init(void)
+{
+ return 0;
+}
+#endif
+
+
static int __init init_gcma(void)
{
pr_info("loading gcma\n");
@@ -743,6 +788,7 @@ static int __init init_gcma(void)
frontswap_writethrough(true);
frontswap_register_ops(&gcma_frontswap_ops);
+ gcma_debugfs_init();
return 0;
}
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* [RFC v1 6/6] gcma: integrate gcma under cma interface
2014-11-11 15:00 [RFC v1 0/6] introduce gcma SeongJae Park
` (4 preceding siblings ...)
2014-11-11 15:00 ` [RFC v1 5/6] gcma: export statistical data on debugfs SeongJae Park
@ 2014-11-11 15:00 ` SeongJae Park
2014-11-11 18:57 ` [RFC v1 0/6] introduce gcma Christoph Lameter
6 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2014-11-11 15:00 UTC (permalink / raw)
To: akpm; +Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, SeongJae Park
Currently, cma reserves large contiguous memory area during early boot
and let the area could be used by others for movable pages only. Then,
if the movable pages arenecessary for contiguous memory allocation, cma
migrates and/or discards them out.
This mechanism have two weakness.
1) Because any one in kernel can pin any movable pages, contiguous
memory allocation could be fail due to migration failure.
2) Because of migration / reclaim overhead, the latency could be
extremely high.
In short, cma doesn't guarantee success and fast latency of contiguous
memory allocation. The problem was discussed in detail from [1] and [2].
gcma, which introduced by above patches, guarantees success and fast
latency of contiguous memory allocation. gcma concept and
implementation, performance evaluation was presented in detail from [2].
This patch let cma clients to be able to use gcma easily using friendly
cma interface by integrating gcma under cma interface.
After this patch, clients can decalre a contiguous memory area to be
managed in gcma way instead of cma way internally by using
gcma_declare_contiguous() function call. After declaration, clients can
use the area using familiar cma interface while it works in gcma way.
For example, you can use following code snippet to make two contiguous
regions: one region will work as cma and the other will work as gcma.
```
struct cma *cma, *gcma;
cma_declare_contiguous(base, size, limit, 0, 0, fixed, &cma);
gcma_declare_contiguous(gcma_base, size, gcma_limit, 0, 0, fixed, &gcma);
cma_alloc(cma, 1024, 0); /* alloc in cma way */
cma_alloc(gcma, 1024, 0); /* alloc in gcma way */
```
[1] https://lkml.org/lkml/2013/10/30/16
[2] http://sched.co/1qZcBAO
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
include/linux/cma.h | 4 ++
include/linux/gcma.h | 21 ++++++++++
mm/Kconfig | 15 +++++++
mm/Makefile | 2 +
mm/cma.c | 110 ++++++++++++++++++++++++++++++++++++++++-----------
5 files changed, 129 insertions(+), 23 deletions(-)
diff --git a/include/linux/cma.h b/include/linux/cma.h
index 371b930..f81d0dd 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -22,6 +22,10 @@ extern int __init cma_declare_contiguous(phys_addr_t size,
phys_addr_t base, phys_addr_t limit,
phys_addr_t alignment, unsigned int order_per_bit,
bool fixed, struct cma **res_cma);
+extern int __init gcma_declare_contiguous(phys_addr_t size,
+ phys_addr_t base, phys_addr_t limit,
+ phys_addr_t alignment, unsigned int order_per_bit,
+ bool fixed, struct cma **res_cma);
extern struct page *cma_alloc(struct cma *cma, int count, unsigned int align);
extern bool cma_release(struct cma *cma, struct page *pages, int count);
#endif
diff --git a/include/linux/gcma.h b/include/linux/gcma.h
index d733a9b..dedbd0f 100644
--- a/include/linux/gcma.h
+++ b/include/linux/gcma.h
@@ -16,6 +16,25 @@
struct gcma;
+#ifndef CONFIG_GCMA
+
+inline int gcma_init(unsigned long start_pfn, unsigned long size,
+ struct gcma **res_gcma)
+{
+ return 0;
+}
+
+inline int gcma_alloc_contig(struct gcma *gcma,
+ unsigned long start, unsigned long end)
+{
+ return 0;
+}
+
+void gcma_free_contig(struct gcma *gcma,
+ unsigned long pfn, unsigned long nr_pages) { }
+
+#else
+
int gcma_init(unsigned long start_pfn, unsigned long size,
struct gcma **res_gcma);
int gcma_alloc_contig(struct gcma *gcma,
@@ -23,4 +42,6 @@ int gcma_alloc_contig(struct gcma *gcma,
void gcma_free_contig(struct gcma *gcma,
unsigned long start_pfn, unsigned long size);
+#endif
+
#endif /* _LINUX_GCMA_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 886db21..1b232e3 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -519,6 +519,21 @@ config CMA_AREAS
If unsure, leave the default value "7".
+config GCMA
+ bool "Guaranteed Contiguous Memory Allocator (EXPERIMENTAL)"
+ default n
+ select FRONTSWAP
+ select CMA
+ help
+ A contiguous memory allocator which guarantees success and
+ predictable latency for allocation request.
+ It carves out large amount of memory and let them be allocated
+ to the contiguous memory request while it can be used as backend
+ for frontswap.
+
+ This is marked experimental because it is a new feature that
+ interacts heavily with memory reclaim.
+
config MEM_SOFT_DIRTY
bool "Track memory changes"
depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS
diff --git a/mm/Makefile b/mm/Makefile
index 632ae77..ecff2c7 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -33,6 +33,7 @@ obj-$(CONFIG_HAVE_MEMBLOCK) += memblock.o
obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o
obj-$(CONFIG_FRONTSWAP) += frontswap.o
obj-$(CONFIG_ZSWAP) += zswap.o
+obj-$(CONFIG_GCMA) += gcma.o
obj-$(CONFIG_HAS_DMA) += dmapool.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
obj-$(CONFIG_NUMA) += mempolicy.o
@@ -64,3 +65,4 @@ obj-$(CONFIG_ZBUD) += zbud.o
obj-$(CONFIG_ZSMALLOC) += zsmalloc.o
obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o
obj-$(CONFIG_CMA) += cma.o
+obj-$(CONFIG_GCMA) += gcma.o
diff --git a/mm/cma.c b/mm/cma.c
index c17751c..b085288 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -32,6 +32,9 @@
#include <linux/slab.h>
#include <linux/log2.h>
#include <linux/cma.h>
+#include <linux/gcma.h>
+
+#define IS_GCMA ((struct gcma *)(void *)0xFF)
struct cma {
unsigned long base_pfn;
@@ -39,6 +42,7 @@ struct cma {
unsigned long *bitmap;
unsigned int order_per_bit; /* Order of pages represented by one bit */
struct mutex lock;
+ struct gcma *gcma;
};
static struct cma cma_areas[MAX_CMA_AREAS];
@@ -83,26 +87,25 @@ static void cma_clear_bitmap(struct cma *cma, unsigned long pfn, int count)
mutex_unlock(&cma->lock);
}
-static int __init cma_activate_area(struct cma *cma)
+/*
+ * Return reserved pages for CMA to buddy allocator for using those pages
+ * as movable pages.
+ * Return 0 if it's called successfully. Otherwise, non-zero.
+ */
+static int free_reserved_pages(unsigned long pfn, unsigned long count)
{
- int bitmap_size = BITS_TO_LONGS(cma_bitmap_maxno(cma)) * sizeof(long);
- unsigned long base_pfn = cma->base_pfn, pfn = base_pfn;
- unsigned i = cma->count >> pageblock_order;
+ int ret = 0;
+ unsigned long base_pfn;
struct zone *zone;
- cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
-
- if (!cma->bitmap)
- return -ENOMEM;
-
- WARN_ON_ONCE(!pfn_valid(pfn));
+ count = count >> pageblock_order;
zone = page_zone(pfn_to_page(pfn));
do {
- unsigned j;
+ unsigned i;
base_pfn = pfn;
- for (j = pageblock_nr_pages; j; --j, pfn++) {
+ for (i = pageblock_nr_pages; i; --i, pfn++) {
WARN_ON_ONCE(!pfn_valid(pfn));
/*
* alloc_contig_range requires the pfn range
@@ -110,18 +113,40 @@ static int __init cma_activate_area(struct cma *cma)
* simple by forcing the entire CMA resv range
* to be in the same zone.
*/
- if (page_zone(pfn_to_page(pfn)) != zone)
- goto err;
+ if (page_zone(pfn_to_page(pfn)) != zone) {
+ ret = -EINVAL;
+ break;
+ }
}
init_cma_reserved_pageblock(pfn_to_page(base_pfn));
- } while (--i);
+ } while (--count);
+ return ret;
+}
+
+static int __init cma_activate_area(struct cma *cma)
+{
+ int bitmap_size = BITS_TO_LONGS(cma_bitmap_maxno(cma)) * sizeof(long);
+ unsigned long base_pfn = cma->base_pfn, pfn = base_pfn;
+ int fail;
+
+ cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+ if (!cma->bitmap)
+ return -ENOMEM;
+
+ WARN_ON_ONCE(!pfn_valid(pfn));
+
+ if (cma->gcma == IS_GCMA)
+ fail = gcma_init(cma->base_pfn, cma->count, &cma->gcma);
+ else
+ fail = free_reserved_pages(cma->base_pfn, cma->count);
+ if (fail != 0) {
+ kfree(cma->bitmap);
+ return -EINVAL;
+ }
mutex_init(&cma->lock);
return 0;
-
-err:
- kfree(cma->bitmap);
- return -EINVAL;
}
static int __init cma_init_reserved_areas(void)
@@ -140,7 +165,7 @@ static int __init cma_init_reserved_areas(void)
core_initcall(cma_init_reserved_areas);
/**
- * cma_declare_contiguous() - reserve custom contiguous area
+ * __declare_contiguous() - reserve custom contiguous area
* @base: Base address of the reserved area optional, use 0 for any
* @size: Size of the reserved area (in bytes),
* @limit: End address of the reserved memory (optional, 0 for any).
@@ -157,7 +182,7 @@ core_initcall(cma_init_reserved_areas);
* If @fixed is true, reserve contiguous area at exactly @base. If false,
* reserve in range from @base to @limit.
*/
-int __init cma_declare_contiguous(phys_addr_t base,
+int __init __declare_contiguous(phys_addr_t base,
phys_addr_t size, phys_addr_t limit,
phys_addr_t alignment, unsigned int order_per_bit,
bool fixed, struct cma **res_cma)
@@ -235,6 +260,36 @@ err:
}
/**
+ * gcma_declare_contiguous() - same as cma_declare_contiguous() except result
+ * cma's is_gcma field setting.
+ */
+int __init gcma_declare_contiguous(phys_addr_t base,
+ phys_addr_t size, phys_addr_t limit,
+ phys_addr_t alignment, unsigned int order_per_bit,
+ bool fixed, struct cma **res_cma)
+{
+ int ret = 0;
+ ret = __declare_contiguous(base, size, limit, alignment,
+ order_per_bit, fixed, res_cma);
+ if (ret >= 0)
+ (*res_cma)->gcma = IS_GCMA;
+
+ return ret;
+}
+
+int __init cma_declare_contiguous(phys_addr_t base,
+ phys_addr_t size, phys_addr_t limit,
+ phys_addr_t alignment, unsigned int order_per_bit,
+ bool fixed, struct cma **res_cma)
+{
+ int ret = 0;
+ ret = __declare_contiguous(base, size, limit, alignment,
+ order_per_bit, fixed, res_cma);
+
+ return ret;
+}
+
+/**
* cma_alloc() - allocate pages from contiguous area
* @cma: Contiguous memory region for which the allocation is performed.
* @count: Requested number of pages.
@@ -281,7 +336,12 @@ struct page *cma_alloc(struct cma *cma, int count, unsigned int align)
pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
mutex_lock(&cma_mutex);
- ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
+
+ if (cma->gcma)
+ ret = gcma_alloc_contig(cma->gcma, pfn, count);
+ else
+ ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
+
mutex_unlock(&cma_mutex);
if (ret == 0) {
page = pfn_to_page(pfn);
@@ -328,7 +388,11 @@ bool cma_release(struct cma *cma, struct page *pages, int count)
VM_BUG_ON(pfn + count > cma->base_pfn + cma->count);
- free_contig_range(pfn, count);
+ if (cma->gcma)
+ gcma_free_contig(cma->gcma, pfn, count);
+ else
+ free_contig_range(pfn, count);
+
cma_clear_bitmap(cma, pfn, count);
return true;
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [RFC v1 0/6] introduce gcma
2014-11-11 15:00 [RFC v1 0/6] introduce gcma SeongJae Park
` (5 preceding siblings ...)
2014-11-11 15:00 ` [RFC v1 6/6] gcma: integrate gcma under cma interface SeongJae Park
@ 2014-11-11 18:57 ` Christoph Lameter
2014-11-12 7:02 ` SeongJae Park
6 siblings, 1 reply; 9+ messages in thread
From: Christoph Lameter @ 2014-11-11 18:57 UTC (permalink / raw)
To: SeongJae Park; +Cc: akpm, lauraa, minchan, sergey.senozhatsky, linux-mm
On Wed, 12 Nov 2014, SeongJae Park wrote:
> Difference with cma is choice and operation of 2nd-class client. In gcma,
> 2nd-class client should allocate pages from the reserved area only if the
> allocated pages mets following conditions.
How about making CMA configurable in some fashion to be able to specify
the type of 2nd class clients? Clean page-cache pages can also be rather
easily evicted (see zone-reclaim). You could migrate them out when they
are dirtied so that you do not have the high writeback latency from the
CMA reserved area if it needs to be evicted later.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [RFC v1 0/6] introduce gcma
2014-11-11 18:57 ` [RFC v1 0/6] introduce gcma Christoph Lameter
@ 2014-11-12 7:02 ` SeongJae Park
0 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2014-11-12 7:02 UTC (permalink / raw)
To: Christoph Lameter
Cc: SeongJae Park, akpm, lauraa, minchan, sergey.senozhatsky, linux-mm
Hi Christoph,
On Tue, 11 Nov 2014, Christoph Lameter wrote:
> On Wed, 12 Nov 2014, SeongJae Park wrote:
>
>> Difference with cma is choice and operation of 2nd-class client. In gcma,
>> 2nd-class client should allocate pages from the reserved area only if the
>> allocated pages mets following conditions.
>
> How about making CMA configurable in some fashion to be able to specify
> the type of 2nd class clients? Clean page-cache pages can also be rather
> easily evicted (see zone-reclaim). You could migrate them out when they
> are dirtied so that you do not have the high writeback latency from the
> CMA reserved area if it needs to be evicted later.
Nice point.
Currently, gcma is integrated inside cma and user could decide a specific
contiguous memory area to work in cma way(movable pages as 2nd class) or
in gcma way(out-of-kernel, easy-to-discard pages as 2nd class).
It is implemented in 6th change of this RFC, "gcma: integrate gcma under
cma interface".
In short, the 2nd-clients of cma is already configurable between
movable pages and frontswap backend with this RFC.
And yes, cleancache will be great 2nd class client.
As described within coverletter, our 2nd class client candidates are
frontswap and _cleancache_. But, because the gcma is still in unmatured
sate yet, current RFC(this patchset) use only frontswap.
In future, it will be configurable.
Apologize I forgot to describe about future plan.
Thanks,
SeongJae Park
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread