linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yinghai Lu <yinghai@kernel.org>
To: Tejun Heo <tj@kernel.org>
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>,
	Sasha Levin <levinsasha928@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Miller <davem@davemloft.net>,
	hpa@linux.intel.com, linux-mm <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Early boot panic on machine with lots of memory
Date: Thu, 21 Jun 2012 18:47:24 -0700	[thread overview]
Message-ID: <CAE9FiQXubmnKHjnqOxVeoJknJZFNuStCcW=1XC6jLE7eznkTmg@mail.gmail.com> (raw)
In-Reply-To: <20120621201728.GB4642@google.com>

[-- Attachment #1: Type: text/plain, Size: 3421 bytes --]

On Thu, Jun 21, 2012 at 1:17 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Yinghai.
>
> On Tue, Jun 19, 2012 at 07:57:45PM -0700, Yinghai Lu wrote:
>> if it is that case, that change could fix other problem problem too.
>> --- during the one free reserved.regions could double the array.
>
> Yeah, that sounds much more attractive to me too.  Some comments on
> the patch tho.
>
>>  /**
>>   * memblock_double_array - double the size of the memblock regions array
>>   * @type: memblock type of the regions array being doubled
>> @@ -216,7 +204,7 @@ static int __init_memblock memblock_doub
>>
>>       /* Calculate new doubled size */
>>       old_size = type->max * sizeof(struct memblock_region);
>> -     new_size = old_size << 1;
>> +     new_size = PAGE_ALIGN(old_size << 1);
>
> We definintely can use some comments explaining why we want page
> alignment.  It's kinda subtle.

yes.

>
> This is a bit confusing here because old_size is the proper size
> without padding while new_size is page aligned size with possible
> padding.  Maybe discerning {old|new}_alloc_size is clearer?  Also, I
> think adding @new_cnt variable which is calculated together would make
> the code easier to follow.  So, sth like,
>
>        /* explain why page aligning is necessary */
>        old_size = type->max * sizeof(struct memblock_region);
>        old_alloc_size = PAGE_ALIGN(old_size);
>
>        new_max = type->max << 1;
>        new_size = new_max * sizeof(struct memblock_region);
>        new_alloc_size = PAGE_ALIGN(new_size);
>
> and use alloc_sizes for alloc/frees and sizes for everything else.

ok, will add new_alloc_size, old_alloc_size.

>
>>  unsigned long __init free_low_memory_core_early(int nodeid)
>>  {
>>       unsigned long count = 0;
>> -     phys_addr_t start, end;
>> +     phys_addr_t start, end, size;
>>       u64 i;
>>
>> -     /* free reserved array temporarily so that it's treated as free area */
>> -     memblock_free_reserved_regions();
>> +     for_each_free_mem_range(i, MAX_NUMNODES, &start, &end, NULL)
>> +             count += __free_memory_core(start, end);
>>
>> -     for_each_free_mem_range(i, MAX_NUMNODES, &start, &end, NULL) {
>> -             unsigned long start_pfn = PFN_UP(start);
>> -             unsigned long end_pfn = min_t(unsigned long,
>> -                                           PFN_DOWN(end), max_low_pfn);
>> -             if (start_pfn < end_pfn) {
>> -                     __free_pages_memory(start_pfn, end_pfn);
>> -                     count += end_pfn - start_pfn;
>> -             }
>> -     }
>> +     /* free range that is used for reserved array if we allocate it */
>> +     size = get_allocated_memblock_reserved_regions_info(&start);
>> +     if (size)
>> +             count += __free_memory_core(start, start + size);
>
> I'm afraid this is too early.  We don't want the region to be unmapped
> yet.  This should only happen after all memblock usages are finished
> which I don't think is the case yet.

No, it is not early. at that time memblock usage is done.

Also I tested one system with huge memory, duplicated the problem on
KVM that Sasha met.
my patch fixes the problem.

please check attached patch.

Also I add another patch to double check if there is any reference
with reserved.region.
so far there is no reference found.

Thanks

Yinghai

[-- Attachment #2: fix_free_memblock_reserve_v4_5.patch --]
[-- Type: application/octet-stream, Size: 6770 bytes --]

Subject: [PATCH] memblock: free allocated memblock_reserved_regions later

In memblock_free_reserved_regions, will call memblock_free(),
but memblock_free() would double reserved.regions too, so we could free
old range for reserved.regions.

Also tj said there is another bug could be related to this too.

| I don't think we're saving any noticeable
| amount by doing this "free - give it to page allocator - reserve
| again" dancing.  We should just allocate regions aligned to page
| boundaries and free them later when memblock is no longer in use.

So try to allocate that in PAGE_SIZE alignment and free that later.

-v5: Use new_alloc_size, and old_alloc_size to simplify it according to tj.

Cc: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 include/linux/memblock.h |    4 ---
 mm/memblock.c            |   51 +++++++++++++++++++++--------------------------
 mm/nobootmem.c           |   36 ++++++++++++++++++++-------------
 3 files changed, 46 insertions(+), 45 deletions(-)

Index: linux-2.6/include/linux/memblock.h
===================================================================
--- linux-2.6.orig/include/linux/memblock.h
+++ linux-2.6/include/linux/memblock.h
@@ -50,9 +50,7 @@ phys_addr_t memblock_find_in_range_node(
 				phys_addr_t size, phys_addr_t align, int nid);
 phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
 				   phys_addr_t size, phys_addr_t align);
-int memblock_free_reserved_regions(void);
-int memblock_reserve_reserved_regions(void);
-
+phys_addr_t get_allocated_memblock_reserved_regions_info(phys_addr_t *addr);
 void memblock_allow_resize(void);
 int memblock_add_node(phys_addr_t base, phys_addr_t size, int nid);
 int memblock_add(phys_addr_t base, phys_addr_t size);
Index: linux-2.6/mm/memblock.c
===================================================================
--- linux-2.6.orig/mm/memblock.c
+++ linux-2.6/mm/memblock.c
@@ -143,30 +143,6 @@ phys_addr_t __init_memblock memblock_fin
 					   MAX_NUMNODES);
 }
 
-/*
- * Free memblock.reserved.regions
- */
-int __init_memblock memblock_free_reserved_regions(void)
-{
-	if (memblock.reserved.regions == memblock_reserved_init_regions)
-		return 0;
-
-	return memblock_free(__pa(memblock.reserved.regions),
-		 sizeof(struct memblock_region) * memblock.reserved.max);
-}
-
-/*
- * Reserve memblock.reserved.regions
- */
-int __init_memblock memblock_reserve_reserved_regions(void)
-{
-	if (memblock.reserved.regions == memblock_reserved_init_regions)
-		return 0;
-
-	return memblock_reserve(__pa(memblock.reserved.regions),
-		 sizeof(struct memblock_region) * memblock.reserved.max);
-}
-
 static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
 {
 	type->total_size -= type->regions[r].size;
@@ -184,6 +160,18 @@ static void __init_memblock memblock_rem
 	}
 }
 
+phys_addr_t __init_memblock get_allocated_memblock_reserved_regions_info(
+					phys_addr_t *addr)
+{
+	if (memblock.reserved.regions == memblock_reserved_init_regions)
+		return 0;
+
+	*addr = __pa(memblock.reserved.regions);
+
+	return PAGE_ALIGN(sizeof(struct memblock_region) *
+			  memblock.reserved.max);
+}
+
 /**
  * memblock_double_array - double the size of the memblock regions array
  * @type: memblock type of the regions array being doubled
@@ -204,6 +192,7 @@ static int __init_memblock memblock_doub
 						phys_addr_t new_area_size)
 {
 	struct memblock_region *new_array, *old_array;
+	phys_addr_t old_alloc_size, new_alloc_size;
 	phys_addr_t old_size, new_size, addr;
 	int use_slab = slab_is_available();
 	int *in_slab;
@@ -217,6 +206,12 @@ static int __init_memblock memblock_doub
 	/* Calculate new doubled size */
 	old_size = type->max * sizeof(struct memblock_region);
 	new_size = old_size << 1;
+	/*
+	 * We need to allocated new one align to PAGE_SIZE,
+	 *  so late could free them completely.
+	 */
+	old_alloc_size = PAGE_ALIGN(old_size);
+	new_alloc_size = PAGE_ALIGN(new_size);
 
 	/* Retrieve the slab flag */
 	if (type == &memblock.memory)
@@ -245,11 +240,11 @@ static int __init_memblock memblock_doub
 
 		addr = memblock_find_in_range(new_area_start + new_area_size,
 						memblock.current_limit,
-						new_size, sizeof(phys_addr_t));
+						new_alloc_size, PAGE_SIZE);
 		if (!addr && new_area_size)
 			addr = memblock_find_in_range(0,
 					min(new_area_start, memblock.current_limit),
-					new_size, sizeof(phys_addr_t));
+					new_alloc_size, PAGE_SIZE);
 
 		new_array = addr ? __va(addr) : 0;
 	}
@@ -279,13 +274,13 @@ static int __init_memblock memblock_doub
 		kfree(old_array);
 	else if (old_array != memblock_memory_init_regions &&
 		 old_array != memblock_reserved_init_regions)
-		memblock_free(__pa(old_array), old_size);
+		memblock_free(__pa(old_array), old_alloc_size);
 
 	/* Reserve the new array if that comes from the memblock.
 	 * Otherwise, we needn't do it
 	 */
 	if (!use_slab)
-		BUG_ON(memblock_reserve(addr, new_size));
+		BUG_ON(memblock_reserve(addr, new_alloc_size));
 
 	/* Update slab flag */
 	*in_slab = use_slab;
Index: linux-2.6/mm/nobootmem.c
===================================================================
--- linux-2.6.orig/mm/nobootmem.c
+++ linux-2.6/mm/nobootmem.c
@@ -105,27 +105,35 @@ static void __init __free_pages_memory(u
 		__free_pages_bootmem(pfn_to_page(i), 0);
 }
 
+static unsigned long __init __free_memory_core(phys_addr_t start,
+				 phys_addr_t end)
+{
+	unsigned long start_pfn = PFN_UP(start);
+	unsigned long end_pfn = min_t(unsigned long,
+				      PFN_DOWN(end), max_low_pfn);
+
+	if (start_pfn > end_pfn)
+		return 0;
+
+	__free_pages_memory(start_pfn, end_pfn);
+
+	return end_pfn - start_pfn;
+}
+
 unsigned long __init free_low_memory_core_early(int nodeid)
 {
 	unsigned long count = 0;
-	phys_addr_t start, end;
+	phys_addr_t start, end, size;
 	u64 i;
 
-	/* free reserved array temporarily so that it's treated as free area */
-	memblock_free_reserved_regions();
+	for_each_free_mem_range(i, MAX_NUMNODES, &start, &end, NULL)
+		count += __free_memory_core(start, end);
 
-	for_each_free_mem_range(i, MAX_NUMNODES, &start, &end, NULL) {
-		unsigned long start_pfn = PFN_UP(start);
-		unsigned long end_pfn = min_t(unsigned long,
-					      PFN_DOWN(end), max_low_pfn);
-		if (start_pfn < end_pfn) {
-			__free_pages_memory(start_pfn, end_pfn);
-			count += end_pfn - start_pfn;
-		}
-	}
+	/* free range that is used for reserved array if we allocate it */
+	size = get_allocated_memblock_reserved_regions_info(&start);
+	if (size)
+		count += __free_memory_core(start, start + size);
 
-	/* put region array back? */
-	memblock_reserve_reserved_regions();
 	return count;
 }
 

[-- Attachment #3: memblock_reserved_clear_check.patch --]
[-- Type: application/octet-stream, Size: 3926 bytes --]

Subject: [PATCH] memblock: Add checking about illegal using memblock.reserved

After memblock is not used anymore, Clear the memblock reserved so we will not
use it wrongly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/init_32.c    |    3 +++
 arch/x86/mm/init_64.c    |    2 ++
 include/linux/memblock.h |    1 +
 mm/memblock.c            |   15 +++++++++++++++
 4 files changed, 21 insertions(+)

Index: linux-2.6/arch/x86/mm/init_32.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_32.c
+++ linux-2.6/arch/x86/mm/init_32.c
@@ -759,6 +759,9 @@ void __init mem_init(void)
 		if (page_is_ram(tmp) && PageReserved(pfn_to_page(tmp)))
 			reservedpages++;
 
+	/* clear reserved to catch wrong usage */
+	memblock_clear_reserved();
+
 	codesize =  (unsigned long) &_etext - (unsigned long) &_text;
 	datasize =  (unsigned long) &_edata - (unsigned long) &_etext;
 	initsize =  (unsigned long) &__init_end - (unsigned long) &__init_begin;
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -699,6 +699,8 @@ void __init mem_init(void)
 
 	absent_pages = absent_pages_in_range(0, max_pfn);
 	reservedpages = max_pfn - totalram_pages - absent_pages;
+	/* clear reserved to catch wrong usage */
+	memblock_clear_reserved();
 	after_bootmem = 1;
 
 	codesize =  (unsigned long) &_etext - (unsigned long) &_text;
Index: linux-2.6/include/linux/memblock.h
===================================================================
--- linux-2.6.orig/include/linux/memblock.h
+++ linux-2.6/include/linux/memblock.h
@@ -46,6 +46,7 @@ extern int memblock_debug;
 #define memblock_dbg(fmt, ...) \
 	if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
 
+void memblock_clear_reserved(void);
 phys_addr_t memblock_find_in_range_node(phys_addr_t start, phys_addr_t end,
 				phys_addr_t size, phys_addr_t align, int nid);
 phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
Index: linux-2.6/mm/memblock.c
===================================================================
--- linux-2.6.orig/mm/memblock.c
+++ linux-2.6/mm/memblock.c
@@ -101,6 +101,8 @@ phys_addr_t __init_memblock memblock_fin
 	phys_addr_t this_start, this_end, cand;
 	u64 i;
 
+	WARN_ONCE(!memblock.reserved.max, "memblock.reserved was cleared already!");
+
 	/* pump up @end */
 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
 		end = memblock.current_limit;
@@ -143,6 +145,14 @@ phys_addr_t __init_memblock memblock_fin
 					   MAX_NUMNODES);
 }
 
+/*
+ * Clear memblock.reserved
+ */
+void __init_memblock memblock_clear_reserved(void)
+{
+	memset(&memblock.reserved, 0, sizeof(memblock.reserved));
+}
+
 static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
 {
 	type->total_size -= type->regions[r].size;
@@ -535,6 +545,8 @@ int __init_memblock memblock_remove(phys
 
 int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)
 {
+	WARN_ONCE(!memblock.reserved.max, "memblock.reserved was cleared already!");
+
 	memblock_dbg("   memblock_free: [%#016llx-%#016llx] %pF\n",
 		     (unsigned long long)base,
 		     (unsigned long long)base + size,
@@ -547,6 +559,7 @@ int __init_memblock memblock_reserve(phy
 {
 	struct memblock_type *_rgn = &memblock.reserved;
 
+	WARN_ONCE(!memblock.reserved.max, "memblock.reserved was cleared already!");
 	memblock_dbg("memblock_reserve: [%#016llx-%#016llx] %pF\n",
 		     (unsigned long long)base,
 		     (unsigned long long)base + size,
@@ -587,6 +600,8 @@ void __init_memblock __next_free_mem_ran
 	int mi = *idx & 0xffffffff;
 	int ri = *idx >> 32;
 
+	WARN_ONCE(!rsv->max, "memblock.reserved was cleared already!");
+
 	for ( ; mi < mem->cnt; mi++) {
 		struct memblock_region *m = &mem->regions[mi];
 		phys_addr_t m_start = m->base;

  reply	other threads:[~2012-06-22  1:47 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-13 21:38 Sasha Levin
2012-06-14  3:20 ` Tejun Heo
2012-06-14  9:50   ` Sasha Levin
2012-06-14 20:56     ` Yinghai Lu
2012-06-14 21:34       ` Sasha Levin
2012-06-14 23:57         ` Yinghai Lu
2012-06-15  0:59           ` Sasha Levin
2012-06-15  2:21             ` Yinghai Lu
2012-06-15  7:41               ` Sasha Levin
2012-06-18 22:32     ` Tejun Heo
2012-06-18 22:50       ` Sasha Levin
2012-06-19  4:11         ` Gavin Shan
2012-06-19  5:43           ` Yinghai Lu
2012-06-19  6:09             ` Gavin Shan
2012-06-19 18:12               ` Yinghai Lu
2012-06-19 21:20           ` Tejun Heo
2012-06-19 21:26             ` Tejun Heo
2012-06-20  2:57               ` Yinghai Lu
2012-06-21 20:17                 ` Tejun Heo
2012-06-22  1:47                   ` Yinghai Lu [this message]
2012-06-22  1:58                     ` Yinghai Lu
2012-06-22 18:51                     ` Tejun Heo
2012-06-22 19:23                       ` Yinghai Lu
2012-06-22 19:29                         ` Tejun Heo
2012-06-22 20:01                           ` Yinghai Lu
2012-06-22 20:14                             ` Tejun Heo
2012-06-22 20:23                               ` Yinghai Lu
2012-06-23  2:14                           ` Yinghai Lu
2012-06-27 18:13                             ` Tejun Heo
2012-06-27 19:22                               ` Yinghai Lu
2012-06-27 19:26                                 ` Tejun Heo
2012-06-27 21:15                                   ` Yinghai Lu
2012-06-29 18:27                                     ` [PATCH for -3.5] memblock: free allocated memblock_reserved_regions later Yinghai Lu
2012-06-29 18:32                                       ` Tejun Heo
2012-06-29 18:38                                         ` Yinghai Lu
2012-06-21 20:19             ` Early boot panic on machine with lots of memory Tejun Heo
2012-06-22 10:29               ` Sasha Levin
2012-06-22 18:15                 ` Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAE9FiQXubmnKHjnqOxVeoJknJZFNuStCcW=1XC6jLE7eznkTmg@mail.gmail.com' \
    --to=yinghai@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=hpa@linux.intel.com \
    --cc=levinsasha928@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shangw@linux.vnet.ibm.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox