From: Yinghai Lu <yinghai@kernel.org>
To: Tejun Heo <tj@kernel.org>
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>,
Sasha Levin <levinsasha928@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
David Miller <davem@davemloft.net>,
hpa@linux.intel.com, linux-mm <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Early boot panic on machine with lots of memory
Date: Thu, 21 Jun 2012 18:47:24 -0700 [thread overview]
Message-ID: <CAE9FiQXubmnKHjnqOxVeoJknJZFNuStCcW=1XC6jLE7eznkTmg@mail.gmail.com> (raw)
In-Reply-To: <20120621201728.GB4642@google.com>
[-- Attachment #1: Type: text/plain, Size: 3421 bytes --]
On Thu, Jun 21, 2012 at 1:17 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Yinghai.
>
> On Tue, Jun 19, 2012 at 07:57:45PM -0700, Yinghai Lu wrote:
>> if it is that case, that change could fix other problem problem too.
>> --- during the one free reserved.regions could double the array.
>
> Yeah, that sounds much more attractive to me too. Some comments on
> the patch tho.
>
>> /**
>> * memblock_double_array - double the size of the memblock regions array
>> * @type: memblock type of the regions array being doubled
>> @@ -216,7 +204,7 @@ static int __init_memblock memblock_doub
>>
>> /* Calculate new doubled size */
>> old_size = type->max * sizeof(struct memblock_region);
>> - new_size = old_size << 1;
>> + new_size = PAGE_ALIGN(old_size << 1);
>
> We definintely can use some comments explaining why we want page
> alignment. It's kinda subtle.
yes.
>
> This is a bit confusing here because old_size is the proper size
> without padding while new_size is page aligned size with possible
> padding. Maybe discerning {old|new}_alloc_size is clearer? Also, I
> think adding @new_cnt variable which is calculated together would make
> the code easier to follow. So, sth like,
>
> /* explain why page aligning is necessary */
> old_size = type->max * sizeof(struct memblock_region);
> old_alloc_size = PAGE_ALIGN(old_size);
>
> new_max = type->max << 1;
> new_size = new_max * sizeof(struct memblock_region);
> new_alloc_size = PAGE_ALIGN(new_size);
>
> and use alloc_sizes for alloc/frees and sizes for everything else.
ok, will add new_alloc_size, old_alloc_size.
>
>> unsigned long __init free_low_memory_core_early(int nodeid)
>> {
>> unsigned long count = 0;
>> - phys_addr_t start, end;
>> + phys_addr_t start, end, size;
>> u64 i;
>>
>> - /* free reserved array temporarily so that it's treated as free area */
>> - memblock_free_reserved_regions();
>> + for_each_free_mem_range(i, MAX_NUMNODES, &start, &end, NULL)
>> + count += __free_memory_core(start, end);
>>
>> - for_each_free_mem_range(i, MAX_NUMNODES, &start, &end, NULL) {
>> - unsigned long start_pfn = PFN_UP(start);
>> - unsigned long end_pfn = min_t(unsigned long,
>> - PFN_DOWN(end), max_low_pfn);
>> - if (start_pfn < end_pfn) {
>> - __free_pages_memory(start_pfn, end_pfn);
>> - count += end_pfn - start_pfn;
>> - }
>> - }
>> + /* free range that is used for reserved array if we allocate it */
>> + size = get_allocated_memblock_reserved_regions_info(&start);
>> + if (size)
>> + count += __free_memory_core(start, start + size);
>
> I'm afraid this is too early. We don't want the region to be unmapped
> yet. This should only happen after all memblock usages are finished
> which I don't think is the case yet.
No, it is not early. at that time memblock usage is done.
Also I tested one system with huge memory, duplicated the problem on
KVM that Sasha met.
my patch fixes the problem.
please check attached patch.
Also I add another patch to double check if there is any reference
with reserved.region.
so far there is no reference found.
Thanks
Yinghai
[-- Attachment #2: fix_free_memblock_reserve_v4_5.patch --]
[-- Type: application/octet-stream, Size: 6770 bytes --]
Subject: [PATCH] memblock: free allocated memblock_reserved_regions later
In memblock_free_reserved_regions, will call memblock_free(),
but memblock_free() would double reserved.regions too, so we could free
old range for reserved.regions.
Also tj said there is another bug could be related to this too.
| I don't think we're saving any noticeable
| amount by doing this "free - give it to page allocator - reserve
| again" dancing. We should just allocate regions aligned to page
| boundaries and free them later when memblock is no longer in use.
So try to allocate that in PAGE_SIZE alignment and free that later.
-v5: Use new_alloc_size, and old_alloc_size to simplify it according to tj.
Cc: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
include/linux/memblock.h | 4 ---
mm/memblock.c | 51 +++++++++++++++++++++--------------------------
mm/nobootmem.c | 36 ++++++++++++++++++++-------------
3 files changed, 46 insertions(+), 45 deletions(-)
Index: linux-2.6/include/linux/memblock.h
===================================================================
--- linux-2.6.orig/include/linux/memblock.h
+++ linux-2.6/include/linux/memblock.h
@@ -50,9 +50,7 @@ phys_addr_t memblock_find_in_range_node(
phys_addr_t size, phys_addr_t align, int nid);
phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
phys_addr_t size, phys_addr_t align);
-int memblock_free_reserved_regions(void);
-int memblock_reserve_reserved_regions(void);
-
+phys_addr_t get_allocated_memblock_reserved_regions_info(phys_addr_t *addr);
void memblock_allow_resize(void);
int memblock_add_node(phys_addr_t base, phys_addr_t size, int nid);
int memblock_add(phys_addr_t base, phys_addr_t size);
Index: linux-2.6/mm/memblock.c
===================================================================
--- linux-2.6.orig/mm/memblock.c
+++ linux-2.6/mm/memblock.c
@@ -143,30 +143,6 @@ phys_addr_t __init_memblock memblock_fin
MAX_NUMNODES);
}
-/*
- * Free memblock.reserved.regions
- */
-int __init_memblock memblock_free_reserved_regions(void)
-{
- if (memblock.reserved.regions == memblock_reserved_init_regions)
- return 0;
-
- return memblock_free(__pa(memblock.reserved.regions),
- sizeof(struct memblock_region) * memblock.reserved.max);
-}
-
-/*
- * Reserve memblock.reserved.regions
- */
-int __init_memblock memblock_reserve_reserved_regions(void)
-{
- if (memblock.reserved.regions == memblock_reserved_init_regions)
- return 0;
-
- return memblock_reserve(__pa(memblock.reserved.regions),
- sizeof(struct memblock_region) * memblock.reserved.max);
-}
-
static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
{
type->total_size -= type->regions[r].size;
@@ -184,6 +160,18 @@ static void __init_memblock memblock_rem
}
}
+phys_addr_t __init_memblock get_allocated_memblock_reserved_regions_info(
+ phys_addr_t *addr)
+{
+ if (memblock.reserved.regions == memblock_reserved_init_regions)
+ return 0;
+
+ *addr = __pa(memblock.reserved.regions);
+
+ return PAGE_ALIGN(sizeof(struct memblock_region) *
+ memblock.reserved.max);
+}
+
/**
* memblock_double_array - double the size of the memblock regions array
* @type: memblock type of the regions array being doubled
@@ -204,6 +192,7 @@ static int __init_memblock memblock_doub
phys_addr_t new_area_size)
{
struct memblock_region *new_array, *old_array;
+ phys_addr_t old_alloc_size, new_alloc_size;
phys_addr_t old_size, new_size, addr;
int use_slab = slab_is_available();
int *in_slab;
@@ -217,6 +206,12 @@ static int __init_memblock memblock_doub
/* Calculate new doubled size */
old_size = type->max * sizeof(struct memblock_region);
new_size = old_size << 1;
+ /*
+ * We need to allocated new one align to PAGE_SIZE,
+ * so late could free them completely.
+ */
+ old_alloc_size = PAGE_ALIGN(old_size);
+ new_alloc_size = PAGE_ALIGN(new_size);
/* Retrieve the slab flag */
if (type == &memblock.memory)
@@ -245,11 +240,11 @@ static int __init_memblock memblock_doub
addr = memblock_find_in_range(new_area_start + new_area_size,
memblock.current_limit,
- new_size, sizeof(phys_addr_t));
+ new_alloc_size, PAGE_SIZE);
if (!addr && new_area_size)
addr = memblock_find_in_range(0,
min(new_area_start, memblock.current_limit),
- new_size, sizeof(phys_addr_t));
+ new_alloc_size, PAGE_SIZE);
new_array = addr ? __va(addr) : 0;
}
@@ -279,13 +274,13 @@ static int __init_memblock memblock_doub
kfree(old_array);
else if (old_array != memblock_memory_init_regions &&
old_array != memblock_reserved_init_regions)
- memblock_free(__pa(old_array), old_size);
+ memblock_free(__pa(old_array), old_alloc_size);
/* Reserve the new array if that comes from the memblock.
* Otherwise, we needn't do it
*/
if (!use_slab)
- BUG_ON(memblock_reserve(addr, new_size));
+ BUG_ON(memblock_reserve(addr, new_alloc_size));
/* Update slab flag */
*in_slab = use_slab;
Index: linux-2.6/mm/nobootmem.c
===================================================================
--- linux-2.6.orig/mm/nobootmem.c
+++ linux-2.6/mm/nobootmem.c
@@ -105,27 +105,35 @@ static void __init __free_pages_memory(u
__free_pages_bootmem(pfn_to_page(i), 0);
}
+static unsigned long __init __free_memory_core(phys_addr_t start,
+ phys_addr_t end)
+{
+ unsigned long start_pfn = PFN_UP(start);
+ unsigned long end_pfn = min_t(unsigned long,
+ PFN_DOWN(end), max_low_pfn);
+
+ if (start_pfn > end_pfn)
+ return 0;
+
+ __free_pages_memory(start_pfn, end_pfn);
+
+ return end_pfn - start_pfn;
+}
+
unsigned long __init free_low_memory_core_early(int nodeid)
{
unsigned long count = 0;
- phys_addr_t start, end;
+ phys_addr_t start, end, size;
u64 i;
- /* free reserved array temporarily so that it's treated as free area */
- memblock_free_reserved_regions();
+ for_each_free_mem_range(i, MAX_NUMNODES, &start, &end, NULL)
+ count += __free_memory_core(start, end);
- for_each_free_mem_range(i, MAX_NUMNODES, &start, &end, NULL) {
- unsigned long start_pfn = PFN_UP(start);
- unsigned long end_pfn = min_t(unsigned long,
- PFN_DOWN(end), max_low_pfn);
- if (start_pfn < end_pfn) {
- __free_pages_memory(start_pfn, end_pfn);
- count += end_pfn - start_pfn;
- }
- }
+ /* free range that is used for reserved array if we allocate it */
+ size = get_allocated_memblock_reserved_regions_info(&start);
+ if (size)
+ count += __free_memory_core(start, start + size);
- /* put region array back? */
- memblock_reserve_reserved_regions();
return count;
}
[-- Attachment #3: memblock_reserved_clear_check.patch --]
[-- Type: application/octet-stream, Size: 3926 bytes --]
Subject: [PATCH] memblock: Add checking about illegal using memblock.reserved
After memblock is not used anymore, Clear the memblock reserved so we will not
use it wrongly.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/mm/init_32.c | 3 +++
arch/x86/mm/init_64.c | 2 ++
include/linux/memblock.h | 1 +
mm/memblock.c | 15 +++++++++++++++
4 files changed, 21 insertions(+)
Index: linux-2.6/arch/x86/mm/init_32.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_32.c
+++ linux-2.6/arch/x86/mm/init_32.c
@@ -759,6 +759,9 @@ void __init mem_init(void)
if (page_is_ram(tmp) && PageReserved(pfn_to_page(tmp)))
reservedpages++;
+ /* clear reserved to catch wrong usage */
+ memblock_clear_reserved();
+
codesize = (unsigned long) &_etext - (unsigned long) &_text;
datasize = (unsigned long) &_edata - (unsigned long) &_etext;
initsize = (unsigned long) &__init_end - (unsigned long) &__init_begin;
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -699,6 +699,8 @@ void __init mem_init(void)
absent_pages = absent_pages_in_range(0, max_pfn);
reservedpages = max_pfn - totalram_pages - absent_pages;
+ /* clear reserved to catch wrong usage */
+ memblock_clear_reserved();
after_bootmem = 1;
codesize = (unsigned long) &_etext - (unsigned long) &_text;
Index: linux-2.6/include/linux/memblock.h
===================================================================
--- linux-2.6.orig/include/linux/memblock.h
+++ linux-2.6/include/linux/memblock.h
@@ -46,6 +46,7 @@ extern int memblock_debug;
#define memblock_dbg(fmt, ...) \
if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
+void memblock_clear_reserved(void);
phys_addr_t memblock_find_in_range_node(phys_addr_t start, phys_addr_t end,
phys_addr_t size, phys_addr_t align, int nid);
phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
Index: linux-2.6/mm/memblock.c
===================================================================
--- linux-2.6.orig/mm/memblock.c
+++ linux-2.6/mm/memblock.c
@@ -101,6 +101,8 @@ phys_addr_t __init_memblock memblock_fin
phys_addr_t this_start, this_end, cand;
u64 i;
+ WARN_ONCE(!memblock.reserved.max, "memblock.reserved was cleared already!");
+
/* pump up @end */
if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
end = memblock.current_limit;
@@ -143,6 +145,14 @@ phys_addr_t __init_memblock memblock_fin
MAX_NUMNODES);
}
+/*
+ * Clear memblock.reserved
+ */
+void __init_memblock memblock_clear_reserved(void)
+{
+ memset(&memblock.reserved, 0, sizeof(memblock.reserved));
+}
+
static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
{
type->total_size -= type->regions[r].size;
@@ -535,6 +545,8 @@ int __init_memblock memblock_remove(phys
int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)
{
+ WARN_ONCE(!memblock.reserved.max, "memblock.reserved was cleared already!");
+
memblock_dbg(" memblock_free: [%#016llx-%#016llx] %pF\n",
(unsigned long long)base,
(unsigned long long)base + size,
@@ -547,6 +559,7 @@ int __init_memblock memblock_reserve(phy
{
struct memblock_type *_rgn = &memblock.reserved;
+ WARN_ONCE(!memblock.reserved.max, "memblock.reserved was cleared already!");
memblock_dbg("memblock_reserve: [%#016llx-%#016llx] %pF\n",
(unsigned long long)base,
(unsigned long long)base + size,
@@ -587,6 +600,8 @@ void __init_memblock __next_free_mem_ran
int mi = *idx & 0xffffffff;
int ri = *idx >> 32;
+ WARN_ONCE(!rsv->max, "memblock.reserved was cleared already!");
+
for ( ; mi < mem->cnt; mi++) {
struct memblock_region *m = &mem->regions[mi];
phys_addr_t m_start = m->base;
next prev parent reply other threads:[~2012-06-22 1:47 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-13 21:38 Sasha Levin
2012-06-14 3:20 ` Tejun Heo
2012-06-14 9:50 ` Sasha Levin
2012-06-14 20:56 ` Yinghai Lu
2012-06-14 21:34 ` Sasha Levin
2012-06-14 23:57 ` Yinghai Lu
2012-06-15 0:59 ` Sasha Levin
2012-06-15 2:21 ` Yinghai Lu
2012-06-15 7:41 ` Sasha Levin
2012-06-18 22:32 ` Tejun Heo
2012-06-18 22:50 ` Sasha Levin
2012-06-19 4:11 ` Gavin Shan
2012-06-19 5:43 ` Yinghai Lu
2012-06-19 6:09 ` Gavin Shan
2012-06-19 18:12 ` Yinghai Lu
2012-06-19 21:20 ` Tejun Heo
2012-06-19 21:26 ` Tejun Heo
2012-06-20 2:57 ` Yinghai Lu
2012-06-21 20:17 ` Tejun Heo
2012-06-22 1:47 ` Yinghai Lu [this message]
2012-06-22 1:58 ` Yinghai Lu
2012-06-22 18:51 ` Tejun Heo
2012-06-22 19:23 ` Yinghai Lu
2012-06-22 19:29 ` Tejun Heo
2012-06-22 20:01 ` Yinghai Lu
2012-06-22 20:14 ` Tejun Heo
2012-06-22 20:23 ` Yinghai Lu
2012-06-23 2:14 ` Yinghai Lu
2012-06-27 18:13 ` Tejun Heo
2012-06-27 19:22 ` Yinghai Lu
2012-06-27 19:26 ` Tejun Heo
2012-06-27 21:15 ` Yinghai Lu
2012-06-29 18:27 ` [PATCH for -3.5] memblock: free allocated memblock_reserved_regions later Yinghai Lu
2012-06-29 18:32 ` Tejun Heo
2012-06-29 18:38 ` Yinghai Lu
2012-06-21 20:19 ` Early boot panic on machine with lots of memory Tejun Heo
2012-06-22 10:29 ` Sasha Levin
2012-06-22 18:15 ` Yinghai Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAE9FiQXubmnKHjnqOxVeoJknJZFNuStCcW=1XC6jLE7eznkTmg@mail.gmail.com' \
--to=yinghai@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=hpa@linux.intel.com \
--cc=levinsasha928@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=shangw@linux.vnet.ibm.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox