From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e32.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j4DG0U0c576014 for ; Fri, 13 May 2005 12:00:31 -0400 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j4DG0ULY190666 for ; Fri, 13 May 2005 10:00:30 -0600 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j4DG0U9Q030324 for ; Fri, 13 May 2005 10:00:30 -0600 Subject: [RFC] consistency of zone->zone_start_pfn, spanned_pages From: Dave Hansen Content-Type: text/plain Date: Fri, 13 May 2005 09:00:19 -0700 Message-Id: <1116000019.32433.10.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: lhms Cc: linux-mm List-ID: The zone struct has a few important members which explicitly define the range of pages that it manages: zone_start_pfn and spanned_pages. The current memory hotplug coded has concentrated on appending memory to existing zones, which means just increasing spanned_pages. There is currently no code that breaks at runtime if this value is simply incremented. However, some cases exist where memory will be added to the beginning (or before) an existing zone. That means that zone_start_pfn will need to change. Now, consider what happens if this code: static int bad_range(struct zone *zone, struct page *page) { if (page_to_pfn(page) >= zone->zone_start_pfn + zone->spanned_pages) return 1; ... is run while zone_start_pfn is being decremented, but before zone->spanned_pages has been incremented. bad_range() has four callers: __free_pages_bulk()x2, expand() and buffered_rmqueue(). Of these, all but buffered_rmqueue() do the bad_range() call under zone->lock. So, one idea I had was to move just the spanned_pages part of bad_range() under the zone->lock and hold the lock during just the part of a hotplug operation where the resize occurs (patch appended). This will, however, increase lock hold times. I'm currently running performance tests to see if I can detect this. Another idea I had was to use memory barriers. But, that's quite a bit more complex than a single lock, and I think it has the potential to be even more expensive. Perhaps a seq_lock which is conditional on memory hotplug? Any other ideas? -- Dave When doing memory hotplug operations, the size of existing zones can obviously change. This means that zone->zone_{start_pfn,spanned_pages} can change. There are currently no locks that protect these structure members. However, they are rarely accessed at runtime. Outside of swsusp, the only place that I can find is bad_range(). bad_range() has four callers: __free_pages_bulk()x2, expand() and buffered_rmqueue(). Of these, all but buffered_rmqueue() do the bad_range() call under zone->lock. --- memhotplug-dave/mm/page_alloc.c | 30 ++++++++++++++++++++++++++---- 1 files changed, 26 insertions(+), 4 deletions(-) diff -puN mm/page_alloc.c~bad_range-rework mm/page_alloc.c --- memhotplug/mm/page_alloc.c~bad_range-rework 2005-05-12 15:36:04.000000000 -0700 +++ memhotplug-dave/mm/page_alloc.c 2005-05-12 16:07:07.000000000 -0700 @@ -76,20 +76,40 @@ unsigned long __initdata nr_kernel_pages unsigned long __initdata nr_all_pages; /* - * Temporary debugging check for pages not lying within a given zone. + * Due to memory hotplug, this function needs to be called + * under zone->lock. */ -static int bad_range(struct zone *zone, struct page *page) +static int page_outside_zone_boundaries(struct zone *zone, struct page *page) { if (page_to_pfn(page) >= zone->zone_start_pfn + zone->spanned_pages) return 1; if (page_to_pfn(page) < zone->zone_start_pfn) return 1; + + return 0; +} + +static int page_is_consistent(struct zone *zone, struct page *page) +{ #ifdef CONFIG_HOLES_IN_ZONE if (!pfn_valid(page_to_pfn(page))) - return 1; + return 0; #endif if (zone != page_zone(page)) + return 0; + + return 1; +} +/* + * Temporary debugging check for pages not lying within a given zone. + */ +static int bad_range(struct zone *zone, struct page *page) +{ + if (page_outside_zone_boundaries(zone, page)) return 1; + if (!page_is_consistent(zone, page)) + return 1; + return 0; } @@ -502,6 +522,7 @@ static int rmqueue_bulk(struct zone *zon page = __rmqueue(zone, order); if (page == NULL) break; + BUG_ON(page_outside_zone_boundaries(zone, page)); allocated++; list_add_tail(&page->lru, list); } @@ -674,11 +695,12 @@ buffered_rmqueue(struct zone *zone, int if (page == NULL) { spin_lock_irqsave(&zone->lock, flags); page = __rmqueue(zone, order); + BUG_ON(page && page_outside_zone_boundaries(zone, page)); spin_unlock_irqrestore(&zone->lock, flags); } if (page != NULL) { - BUG_ON(bad_range(zone, page)); + BUG_ON(!page_is_consistent(zone, page)); mod_page_state_zone(zone, pgalloc, 1 << order); prep_new_page(page, order); _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org