[RFC] consistency of zone->zone_start_pfn, spanned

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC] consistency of zone->zone_start_pfn, spanned_pages
@ 2005-05-13 16:00 Dave Hansen
  2005-05-13 18:24 ` Robin Holt
  2005-05-17  0:40 ` [Lhms-devel] " KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Hansen @ 2005-05-13 16:00 UTC (permalink / raw)
  To: lhms; +Cc: linux-mm

The zone struct has a few important members which explicitly define the
range of pages that it manages: zone_start_pfn and spanned_pages.

The current memory hotplug coded has concentrated on appending memory to
existing zones, which means just increasing spanned_pages.  There is
currently no code that breaks at runtime if this value is simply
incremented.

However, some cases exist where memory will be added to the beginning
(or before) an existing zone.  That means that zone_start_pfn will need
to change.  Now, consider what happens if this code: 

        static int bad_range(struct zone *zone, struct page *page)
        {
                if (page_to_pfn(page) >= 
                    zone->zone_start_pfn + zone->spanned_pages)
                        return 1;
        ...

is run while zone_start_pfn is being decremented, but before
zone->spanned_pages has been incremented.

bad_range() has four callers: __free_pages_bulk()x2, expand() and
buffered_rmqueue().  Of these, all but buffered_rmqueue() do the
bad_range() call under zone->lock.

So, one idea I had was to move just the spanned_pages part of
bad_range() under the zone->lock and hold the lock during just the part
of a hotplug operation where the resize occurs (patch appended).  This
will, however, increase lock hold times.  I'm currently running
performance tests to see if I can detect this.

Another idea I had was to use memory barriers.  But, that's quite a bit
more complex than a single lock, and I think it has the potential to be
even more expensive.

Perhaps a seq_lock which is conditional on memory hotplug?

Any other ideas?

-- Dave

When doing memory hotplug operations, the size of existing zones can
obviously change.  This means that zone->zone_{start_pfn,spanned_pages}
can change.

There are currently no locks that protect these structure members.
However, they are rarely accessed at runtime.  Outside of swsusp, the
only place that I can find is bad_range().

bad_range() has four callers: __free_pages_bulk()x2, expand() and
buffered_rmqueue().  Of these, all but buffered_rmqueue() do the
bad_range() call under zone->lock.

---

 memhotplug-dave/mm/page_alloc.c |   30 ++++++++++++++++++++++++++----
 1 files changed, 26 insertions(+), 4 deletions(-)

diff -puN mm/page_alloc.c~bad_range-rework mm/page_alloc.c
--- memhotplug/mm/page_alloc.c~bad_range-rework	2005-05-12 15:36:04.000000000 -0700
+++ memhotplug-dave/mm/page_alloc.c	2005-05-12 16:07:07.000000000 -0700
@@ -76,20 +76,40 @@ unsigned long __initdata nr_kernel_pages
 unsigned long __initdata nr_all_pages;

 /*
- * Temporary debugging check for pages not lying within a given zone.
+ * Due to memory hotplug, this function needs to be called
+ * under zone->lock.
  */
-static int bad_range(struct zone *zone, struct page *page)
+static int page_outside_zone_boundaries(struct zone *zone, struct page *page)
 {
 	if (page_to_pfn(page) >= zone->zone_start_pfn + zone->spanned_pages)
 		return 1;
 	if (page_to_pfn(page) < zone->zone_start_pfn)
 		return 1;
+
+	return 0;
+}
+
+static int page_is_consistent(struct zone *zone, struct page *page)
+{
 #ifdef CONFIG_HOLES_IN_ZONE
 	if (!pfn_valid(page_to_pfn(page)))
-		return 1;
+		return 0;
 #endif
 	if (zone != page_zone(page))
+		return 0;
+
+	return 1;
+}
+/*
+ * Temporary debugging check for pages not lying within a given zone.
+ */
+static int bad_range(struct zone *zone, struct page *page)
+{
+	if (page_outside_zone_boundaries(zone, page))
 		return 1;
+	if (!page_is_consistent(zone, page))
+		return 1;
+
 	return 0;
 }

@@ -502,6 +522,7 @@ static int rmqueue_bulk(struct zone *zon
 		page = __rmqueue(zone, order);
 		if (page == NULL)
 			break;
+		BUG_ON(page_outside_zone_boundaries(zone, page));
 		allocated++;
 		list_add_tail(&page->lru, list);
 	}
@@ -674,11 +695,12 @@ buffered_rmqueue(struct zone *zone, int 
 	if (page == NULL) {
 		spin_lock_irqsave(&zone->lock, flags);
 		page = __rmqueue(zone, order);
+		BUG_ON(page && page_outside_zone_boundaries(zone, page));
 		spin_unlock_irqrestore(&zone->lock, flags);
 	}

 	if (page != NULL) {
-		BUG_ON(bad_range(zone, page));
+		BUG_ON(!page_is_consistent(zone, page));
 		mod_page_state_zone(zone, pgalloc, 1 << order);
 		prep_new_page(page, order);

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] consistency of zone->zone_start_pfn, spanned_pages
  2005-05-13 16:00 [RFC] consistency of zone->zone_start_pfn, spanned_pages Dave Hansen
@ 2005-05-13 18:24 ` Robin Holt
  2005-05-13 18:31   ` Dave Hansen
  2005-05-17  9:50   ` Andy Whitcroft
  2005-05-17  0:40 ` [Lhms-devel] " KAMEZAWA Hiroyuki
  1 sibling, 2 replies; 8+ messages in thread
From: Robin Holt @ 2005-05-13 18:24 UTC (permalink / raw)
  To: Dave Hansen; +Cc: lhms, linux-mm

On Fri, May 13, 2005 at 09:00:19AM -0700, Dave Hansen wrote:
> Any other ideas?

Not necessarily a good idea but how about...

static int bad_range(struct zone *zone, struct page *page)
{
	unsigned long start_pfn;
	unsigned long spanned_pages;

	do {
		start_pfn = zone->zone_start_pfn;
		spanned_pages = zone->spanned_pages;
	while (unlikely(start_pfn != zone->zone_start_pfn));

	if (page_to_pfn(page) >= start_pfn + spanned_pages;
		return 1;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] consistency of zone->zone_start_pfn, spanned_pages
  2005-05-13 18:24 ` Robin Holt
@ 2005-05-13 18:31   ` Dave Hansen
  2005-05-17  9:50   ` Andy Whitcroft
  1 sibling, 0 replies; 8+ messages in thread
From: Dave Hansen @ 2005-05-13 18:31 UTC (permalink / raw)
  To: Robin Holt; +Cc: lhms, linux-mm

On Fri, 2005-05-13 at 13:24 -0500, Robin Holt wrote:
> On Fri, May 13, 2005 at 09:00:19AM -0700, Dave Hansen wrote:
> > Any other ideas?
> 
> Not necessarily a good idea but how about...
> 
> static int bad_range(struct zone *zone, struct page *page)
> {
> 	unsigned long start_pfn;
> 	unsigned long spanned_pages;
> 
> 	do {
> 		start_pfn = zone->zone_start_pfn;
> 		spanned_pages = zone->spanned_pages;
> 	while (unlikely(start_pfn != zone->zone_start_pfn));
> 
> 	if (page_to_pfn(page) >= start_pfn + spanned_pages;
> 		return 1;
> }

I think that's a similar idea to using a seq_lock, right?

I have the feeling that also open-coding this for shrinking zones would
make it a bit more complex.  It's a good idea, though.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] consistency of zone->zone_start_pfn, spanned_pages
  2005-05-13 18:24 ` Robin Holt
  2005-05-13 18:31   ` Dave Hansen
@ 2005-05-17  9:50   ` Andy Whitcroft
  2005-05-17 10:46     ` Robin Holt
  1 sibling, 1 reply; 8+ messages in thread
From: Andy Whitcroft @ 2005-05-17  9:50 UTC (permalink / raw)
  To: Robin Holt; +Cc: Dave Hansen, lhms, linux-mm

Robin Holt wrote:

> 	do {
> 		start_pfn = zone->zone_start_pfn;
> 		spanned_pages = zone->spanned_pages;
> 	while (unlikely(start_pfn != zone->zone_start_pfn));

Whilst like a seq_lock, without the memory barriers this isn't safe right?

-apw
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] consistency of zone->zone_start_pfn, spanned_pages
  2005-05-17  9:50   ` Andy Whitcroft
@ 2005-05-17 10:46     ` Robin Holt
  0 siblings, 0 replies; 8+ messages in thread
From: Robin Holt @ 2005-05-17 10:46 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Robin Holt, Dave Hansen, lhms, linux-mm

On Tue, May 17, 2005 at 10:50:28AM +0100, Andy Whitcroft wrote:
> Robin Holt wrote:
> 
> > 	do {
> > 		start_pfn = zone->zone_start_pfn;
> > 		spanned_pages = zone->spanned_pages;
> > 	while (unlikely(start_pfn != zone->zone_start_pfn));
> 
> Whilst like a seq_lock, without the memory barriers this isn't safe right?

Definitely.  You would need barriers.

Robin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Lhms-devel] [RFC] consistency of zone->zone_start_pfn, spanned_pages
  2005-05-13 16:00 [RFC] consistency of zone->zone_start_pfn, spanned_pages Dave Hansen
  2005-05-13 18:24 ` Robin Holt
@ 2005-05-17  0:40 ` KAMEZAWA Hiroyuki
  2005-05-17  5:43   ` Yasunori Goto
  2005-05-17 12:32   ` Andy Whitcroft
  1 sibling, 2 replies; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2005-05-17  0:40 UTC (permalink / raw)
  To: Dave Hansen; +Cc: lhms, linux-mm

Hi,
Dave Hansen wrote:
> The zone struct has a few important members which explicitly define the
> range of pages that it manages: zone_start_pfn and spanned_pages.
> 
> The current memory hotplug coded has concentrated on appending memory to
> existing zones, which means just increasing spanned_pages.  There is
> currently no code that breaks at runtime if this value is simply
> incremented.
> 

How about removing zone_start_pfn and spanned_pages ?

I found they are used in
    bad_range() in page_alloc.c
    mark_free_page() in CONFIG_PM.

And I think we can remove them with section-range-ops when CONFIG_SPARSEMEM=y.
They are used in some another places ?

-- Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Lhms-devel] [RFC] consistency of zone->zone_start_pfn, spanned_pages
  2005-05-17  0:40 ` [Lhms-devel] " KAMEZAWA Hiroyuki
@ 2005-05-17  5:43   ` Yasunori Goto
  2005-05-17 12:32   ` Andy Whitcroft
  1 sibling, 0 replies; 8+ messages in thread
From: Yasunori Goto @ 2005-05-17  5:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki, Dave Hansen; +Cc: lhms, linux-mm


> Hi,
> Dave Hansen wrote:
> > The zone struct has a few important members which explicitly define the
> > range of pages that it manages: zone_start_pfn and spanned_pages.
> > 
> > The current memory hotplug coded has concentrated on appending memory to
> > existing zones, which means just increasing spanned_pages.  There is
> > currently no code that breaks at runtime if this value is simply
> > incremented.
> > 
> 
> How about removing zone_start_pfn and spanned_pages ?
> 
> I found they are used in
>     bad_range() in page_alloc.c
>     mark_free_page() in CONFIG_PM.
> 
> And I think we can remove them with section-range-ops when CONFIG_SPARSEMEM=y.
> They are used in some another places ?

Hmm, spanned_pages is used at build_zonelists_node() 
in page_alloc.c.
It looks like still useful for me to initialize and update zonelists
which is hot-added.


Bye.
-- 
Yasunori Goto 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Lhms-devel] [RFC] consistency of zone->zone_start_pfn, spanned_pages
  2005-05-17  0:40 ` [Lhms-devel] " KAMEZAWA Hiroyuki
  2005-05-17  5:43   ` Yasunori Goto
@ 2005-05-17 12:32   ` Andy Whitcroft
  1 sibling, 0 replies; 8+ messages in thread
From: Andy Whitcroft @ 2005-05-17 12:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Dave Hansen, lhms, linux-mm

KAMEZAWA Hiroyuki wrote:

> How about removing zone_start_pfn and spanned_pages ?
> 
> I found they are used in
>    bad_range() in page_alloc.c
>    mark_free_page() in CONFIG_PM.
> 
> And I think we can remove them with section-range-ops when
> CONFIG_SPARSEMEM=y.
> They are used in some another places ?

This needs careful thought as we use bad_range in __free_pages_bulk() in
a non-debug form to work out whether a free'd page is a member of a zone
when trying to pair it with its buddy.  Take for instance ZONE_DMA on
typical x86 hardware which has 0 complete MAX_ORDER buddies within.

-apw
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-05-17 12:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-05-13 16:00 [RFC] consistency of zone->zone_start_pfn, spanned_pages Dave Hansen
2005-05-13 18:24 ` Robin Holt
2005-05-13 18:31   ` Dave Hansen
2005-05-17  9:50   ` Andy Whitcroft
2005-05-17 10:46     ` Robin Holt
2005-05-17  0:40 ` [Lhms-devel] " KAMEZAWA Hiroyuki
2005-05-17  5:43   ` Yasunori Goto
2005-05-17 12:32   ` Andy Whitcroft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox