linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: linux-mm <linux-mm@kvack.org>
Subject: [Lhms-devel] [RFC] page based page release handler
Date: Tue, 14 Jun 2005 09:56:23 +0900	[thread overview]
Message-ID: <42AE2B37.9050802@jp.fujitsu.com> (raw)

Hi,

  Attached one is a sample implementation of page-based release handler.
I wrote this with intention of enhancing memory-hotplug.
This is only a RFC and I wants some comments.

  When a page is isolated, it is out of the kernel memory management.
Here ,*isolated page* means
   (a)a page is allocated and never freed intentionally
   (b)a page is set PG_reserved and out of control.
   (c)a page is allocated and has no release routine.
This is enough now, and it just leaks from the kernel.

When the kernel wants to remove/move an above page, I think, the kernel has
3 ways.
(1) try to call all subsystem's release handler, one by one
     (the kernel doesn't know its owner)
(2) call the release handler attached to the page.
(3) give up ;(

This patch is the base of (2) and registers page release handler
into pfn indexed radix-tree. Radix trees are separated into zones.
I think (1) can be more complicated than (2).

If someone has an idea, plz reply.

Regards,
-- Kame

--

<<Introduction>>
In some subsystem, a page is sometimes isolated from the kernel.

*isolated* means
1. pages are allocated and never freed
2. pages are set PG_reserved and removed from the kernel's page allocator.
3. subsystem has no page release routine which can be called by
    the kernel's memory management system.

Now, the kernel's page allocator has no concerns for isolated pages
and they are just leaked. This works enough.

Considering memory-hotplug, leaked pages cannot be removed at hot-remove.
To remove all these pages, memory hot-remove has to call
all subsystem's memory release handler, which are not implemented now.

In this patch, new interface page_set_hold()/page_release() are implemented.

- page_set_hold() registers a page release handler for the page.
- page_release()  calls per-page page release hander.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


---

  linux-2.6.12-rc6-kamezawa/include/linux/mm.h     |   42 ++++++++++++
  linux-2.6.12-rc6-kamezawa/include/linux/mmzone.h |    5 +
  linux-2.6.12-rc6-kamezawa/mm/page_alloc.c        |   79 +++++++++++++++++++++++
  3 files changed, 126 insertions(+)

diff -puN include/linux/mmzone.h~page_hold include/linux/mmzone.h
--- linux-2.6.12-rc6/include/linux/mmzone.h~page_hold	2005-06-10 15:54:06.000000000 +0900
+++ linux-2.6.12-rc6-kamezawa/include/linux/mmzone.h	2005-06-13 15:43:07.000000000 +0900
@@ -13,6 +13,7 @@
  #include <linux/numa.h>
  #include <linux/init.h>
  #include <asm/atomic.h>
+#include <linux/radix-tree.h>

  /* Free memory management - zoned buddy allocator.  */
  #ifndef CONFIG_FORCE_MAX_ZONEORDER
@@ -206,6 +207,10 @@ struct zone {
  	unsigned long		spanned_pages;	/* total size, including holes */
  	unsigned long		present_pages;	/* amount of memory (excluding holes) */

+	/* used for page_set_hold(). */
+	rwlock_t		page_holder_lock;
+	struct radix_tree_root  page_holder;
+
  	/*
  	 * rarely used fields:
  	 */
diff -puN include/linux/mm.h~page_hold include/linux/mm.h
--- linux-2.6.12-rc6/include/linux/mm.h~page_hold	2005-06-10 16:15:13.000000000 +0900
+++ linux-2.6.12-rc6-kamezawa/include/linux/mm.h	2005-06-13 16:20:55.000000000 +0900
@@ -654,6 +654,48 @@ struct shrinker;
  extern struct shrinker *set_shrinker(int, shrinker_t);
  extern void remove_shrinker(struct shrinker *shrinker);

+
+
+/*
+ * Followings pages cannot be freed by the kernel memory controller,
+ * page allocator/kswapd etc..
+ *
+ * 1. allocate a page and never free.
+ * 2. set PG_reserved (if it's mapped by processes)
+ * 3. a page held by a subsystem which has no interface for page shirinking/release.
+ *
+ * page_set_hold()/release_page() are generic interface for registering page-release-handler.
+ * With this interface, all subsystems can implement its own
+ * page-relsease-handler and page-type-recognition in a generic way.
+ *
+ * Note: Because major subsystems (filesystems etc...) have its own handler/information in
+ *       the page struct, they will not need this.
+ *       this interface doesn't support stack of handlers for a page.
+ */
+
+struct page_holder_ops {
+        char    *name;
+        int     (*release)(struct page *);
+};
+extern int page_set_hold(struct page *page, struct page_holder_ops *ops, int overwrite);
+extern void page_unset_hold(struct page *page);
+extern int page_release(struct page *page);
+extern struct page_holder_ops *__is_page_held(struct page *page);
+
+static inline int is_page_held(struct page *page)
+{
+	return (__is_page_held(page))? 1 : 0;
+}
+
+static inline char *page_owner(struct page *page)
+{
+	struct page_holder_ops *ops;
+	ops = __is_page_held(page);
+	if (!ops)
+		return NULL;
+	return ops->name;
+}
+
  /*
   * On a two-level or three-level page table, this ends up being trivial. Thus
   * the inlining and the symmetry break with pte_alloc_map() that does all
diff -puN mm/page_alloc.c~page_hold mm/page_alloc.c
--- linux-2.6.12-rc6/mm/page_alloc.c~page_hold	2005-06-10 18:26:25.000000000 +0900
+++ linux-2.6.12-rc6-kamezawa/mm/page_alloc.c	2005-06-13 16:30:02.000000000 +0900
@@ -1739,6 +1739,8 @@ static void __init free_area_init_core(s
  			printk(KERN_CRIT "BUG: wrong zone alignment, it will crash\n");

  		memmap_init(size, nid, j, zone_start_pfn);
+		rwlock_init(&zone->page_holder_lock);
+		INIT_RADIX_TREE(&zone->page_holder, GFP_KERNEL);

  		zone_start_pfn += size;

@@ -2236,3 +2238,80 @@ void *__init alloc_large_system_hash(con

  	return table;
  }
+
+/*
+ * page_hold()/page_release()
+ * informations is managed by radix-tree per zone.
+ */
+static struct page_holder_ops * __lookup_page_holder(struct zone *zone, unsigned long pfn)
+{
+	struct page_holder_ops *op;
+	read_lock_irq(&zone->page_holder_lock);
+	op = radix_tree_lookup(&zone->page_holder, pfn);
+	read_unlock_irq(&zone->page_holder_lock);
+	return op;
+}
+
+static int add_to_page_holder(struct zone *zone, unsigned long pfn, struct page_holder_ops *ops)
+{
+	int error;
+	error = radix_tree_preload(GFP_KERNEL);
+	if (!error) {
+		write_lock_irq(&zone->page_holder_lock);
+		error = radix_tree_insert(&zone->page_holder, pfn, ops);
+		write_unlock_irq(&zone->page_holder_lock);
+		radix_tree_preload_end();
+	}
+	return error;
+}
+
+static int remove_from_page_holder(struct zone *zone, unsigned long pfn)
+{
+	struct page_holder_ops *ops;
+	write_lock_irq(&zone->page_holder_lock);
+	ops = radix_tree_delete(&zone->page_holder, pfn);
+	write_unlock_irq(&zone->page_holder_lock);
+	return (ops)? 1 : 0;
+}
+
+int page_set_hold(struct page *page, struct page_holder_ops *ops, int overwrite)
+{
+	struct zone *zone;
+	struct page_holder_ops *tmp;
+	unsigned long pfn = page_to_pfn(page);
+	int error;
+	zone = page_zone(page);
+	tmp = __lookup_page_holder(zone, pfn);
+	if (!overwrite && tmp) {
+		printk("page_hold_handler is overwritten [%s] %lx\n",ops->name, pfn);
+		return 1;
+	}
+	error = add_to_page_holder(zone, pfn, ops);
+	return error;
+}
+
+void page_unset_hold(struct page *page)
+{
+	return remove_from_page_holder(page_zone(page), page_to_pfn(page));
+}
+
+struct page_holder_ops *__is_page_held(struct page *page) {
+	return  __lookup_page_holder(page_zone(page), page_to_pfn(page));
+}
+
+int page_release(struct page *page)
+{
+	int error;
+	struct page_holder_ops *ops;
+	struct zone *zone = page_zone(page);
+	unsigned long pfn = page_to_pfn(page);
+
+	ops = __lookup_page_holder(zone, pfn);
+	if ((!ops) || (!ops->release)) /* this page has no page_release_hander */
+		return 1;
+	error = (*ops->release)(page);
+	if (error)
+		return error;
+	remove_from_page_holder(zone, pfn);
+	return 0;
+}

_




-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy.
Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20
_______________________________________________
Lhms-devel mailing list
Lhms-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lhms-devel


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

                 reply	other threads:[~2005-06-14  0:50 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42AE2B37.9050802@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox