linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <haveblue@us.ibm.com>
To: lhms <lhms-devel@lists.sourceforge.net>
Cc: linux-mm <linux-mm@kvack.org>
Subject: [RFC] consistency of zone->zone_start_pfn, spanned_pages
Date: Fri, 13 May 2005 09:00:19 -0700	[thread overview]
Message-ID: <1116000019.32433.10.camel@localhost> (raw)

The zone struct has a few important members which explicitly define the
range of pages that it manages: zone_start_pfn and spanned_pages.

The current memory hotplug coded has concentrated on appending memory to
existing zones, which means just increasing spanned_pages.  There is
currently no code that breaks at runtime if this value is simply
incremented.

However, some cases exist where memory will be added to the beginning
(or before) an existing zone.  That means that zone_start_pfn will need
to change.  Now, consider what happens if this code: 

        static int bad_range(struct zone *zone, struct page *page)
        {
                if (page_to_pfn(page) >= 
                    zone->zone_start_pfn + zone->spanned_pages)
                        return 1;
        ...

is run while zone_start_pfn is being decremented, but before
zone->spanned_pages has been incremented.

bad_range() has four callers: __free_pages_bulk()x2, expand() and
buffered_rmqueue().  Of these, all but buffered_rmqueue() do the
bad_range() call under zone->lock.

So, one idea I had was to move just the spanned_pages part of
bad_range() under the zone->lock and hold the lock during just the part
of a hotplug operation where the resize occurs (patch appended).  This
will, however, increase lock hold times.  I'm currently running
performance tests to see if I can detect this.

Another idea I had was to use memory barriers.  But, that's quite a bit
more complex than a single lock, and I think it has the potential to be
even more expensive.

Perhaps a seq_lock which is conditional on memory hotplug?

Any other ideas?

-- Dave


When doing memory hotplug operations, the size of existing zones can
obviously change.  This means that zone->zone_{start_pfn,spanned_pages}
can change.

There are currently no locks that protect these structure members.
However, they are rarely accessed at runtime.  Outside of swsusp, the
only place that I can find is bad_range().

bad_range() has four callers: __free_pages_bulk()x2, expand() and
buffered_rmqueue().  Of these, all but buffered_rmqueue() do the
bad_range() call under zone->lock.



---

 memhotplug-dave/mm/page_alloc.c |   30 ++++++++++++++++++++++++++----
 1 files changed, 26 insertions(+), 4 deletions(-)

diff -puN mm/page_alloc.c~bad_range-rework mm/page_alloc.c
--- memhotplug/mm/page_alloc.c~bad_range-rework	2005-05-12 15:36:04.000000000 -0700
+++ memhotplug-dave/mm/page_alloc.c	2005-05-12 16:07:07.000000000 -0700
@@ -76,20 +76,40 @@ unsigned long __initdata nr_kernel_pages
 unsigned long __initdata nr_all_pages;
 
 /*
- * Temporary debugging check for pages not lying within a given zone.
+ * Due to memory hotplug, this function needs to be called
+ * under zone->lock.
  */
-static int bad_range(struct zone *zone, struct page *page)
+static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
 {
 	if (page_to_pfn(page) >= zone->zone_start_pfn + zone->spanned_pages)
 		return 1;
 	if (page_to_pfn(page) < zone->zone_start_pfn)
 		return 1;
+
+	return 0;
+}
+
+static int page_is_consistent(struct zone *zone, struct page *page)
+{
 #ifdef CONFIG_HOLES_IN_ZONE
 	if (!pfn_valid(page_to_pfn(page)))
-		return 1;
+		return 0;
 #endif
 	if (zone != page_zone(page))
+		return 0;
+
+	return 1;
+}
+/*
+ * Temporary debugging check for pages not lying within a given zone.
+ */
+static int bad_range(struct zone *zone, struct page *page)
+{
+	if (page_outside_zone_boundaries(zone, page))
 		return 1;
+	if (!page_is_consistent(zone, page))
+		return 1;
+
 	return 0;
 }
 
@@ -502,6 +522,7 @@ static int rmqueue_bulk(struct zone *zon
 		page = __rmqueue(zone, order);
 		if (page == NULL)
 			break;
+		BUG_ON(page_outside_zone_boundaries(zone, page));
 		allocated++;
 		list_add_tail(&page->lru, list);
 	}
@@ -674,11 +695,12 @@ buffered_rmqueue(struct zone *zone, int 
 	if (page == NULL) {
 		spin_lock_irqsave(&zone->lock, flags);
 		page = __rmqueue(zone, order);
+		BUG_ON(page && page_outside_zone_boundaries(zone, page));
 		spin_unlock_irqrestore(&zone->lock, flags);
 	}
 
 	if (page != NULL) {
-		BUG_ON(bad_range(zone, page));
+		BUG_ON(!page_is_consistent(zone, page));
 		mod_page_state_zone(zone, pgalloc, 1 << order);
 		prep_new_page(page, order);
 
_


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

             reply	other threads:[~2005-05-13 16:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-13 16:00 Dave Hansen [this message]
2005-05-13 18:24 ` Robin Holt
2005-05-13 18:31   ` Dave Hansen
2005-05-17  9:50   ` Andy Whitcroft
2005-05-17 10:46     ` Robin Holt
2005-05-17  0:40 ` [Lhms-devel] " KAMEZAWA Hiroyuki
2005-05-17  5:43   ` Yasunori Goto
2005-05-17 12:32   ` Andy Whitcroft

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1116000019.32433.10.camel@localhost \
    --to=haveblue@us.ibm.com \
    --cc=lhms-devel@lists.sourceforge.net \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox