From: Mel Gorman <mel@csn.ul.ie>
To: Andrew Morton <akpm@osdl.org>
Cc: davej@codemonkey.org.uk, tony.luck@intel.com, ak@suse.de,
bob.picco@hp.com, linux-kernel@vger.kernel.org,
linuxppc-dev@ozlabs.org, linux-mm@kvack.org
Subject: Re: [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes
Date: Sun, 21 May 2006 23:23:51 +0100 (IST) [thread overview]
Message-ID: <Pine.LNX.4.64.0605212219540.27379@skynet.skynet.ie> (raw)
In-Reply-To: <20060521120843.43babdc7.akpm@osdl.org>
On Sun, 21 May 2006, Andrew Morton wrote:
> Mel Gorman <mel@csn.ul.ie> wrote:
>>
>
>>> Anyway, I just don't get how this code can work. We have an e820 map with
>>> up to 128 entries (this machine has ten) and we're trying to scrunch that
>>> all into the four-entry early_node_map[].
>>>
>>
>> Missing E820MAX was a mistake. On x86_64, CONFIG_MAX_ACTIVE_REGIONS should
>> have been used. I didn't expect x86_64 to have so many memory holes.
>
> x86 uses 128 e820 slots too.
>
That is true, but with x86, I am not expecting many regions. For flatmem,
only one region will be registered. For NUMA, I would expect one
registration per node *unless* SRAT is being used. With SRAT, MAXCHUNKS
regions at most with is 4 * MAX_NUMNODES.
>>
>>> On my little x86 PC:
>>>
>>> BIOS-provided physical RAM map:
>>> BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
>>> BIOS-e820: 000000000009bc00 - 000000000009c000 (reserved)
>>> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
>>> BIOS-e820: 0000000000100000 - 000000000ffc0000 (usable)
>>> BIOS-e820: 000000000ffc0000 - 000000000fff8000 (ACPI data)
>>> BIOS-e820: 000000000fff8000 - 0000000010000000 (ACPI NVS)
>>> BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
>>> BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
>>> BIOS-e820: 00000000ffb80000 - 00000000ffc00000 (reserved)
>>> BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
>>> 0MB HIGHMEM available.
>>> 255MB LOWMEM available.
>>> found SMP MP-table at 000ff780
>>> Range (nid 0) 0 -> 65472, max 4
>>> On node 0 totalpages: 65472
>>> DMA zone: 4096 pages, LIFO batch:0
>>> Normal zone: 61376 pages, LIFO batch:15
>>>
>>> So here, the architecture code only called add_active_range() the once, for
>>> the entire memory map.
>>
>> Because in this case, the architecture reported that there was just one
>> range of available pages with no holes.
>
> So.. we're registering a simgle blob of pfns which includes the "reserved"
> memory as well as the "ACPI data" and the "ACPI NVS" (with an apparent
> off-by-one here).
>
The off-by-one is a surprise. On this machine, it must be because the
arch-specific code calculated highend_pfn wrong. I don't use the e820 on
i386 because it didn't seem necessary.
> How come the machine still works? I guess the architecture went and marked
> those pfns reserved.
>
Yes, that is what I'd expect to happen. The ranges are registered and a
memmap allocated but the freeing of memory from bootmem is still the same
on i386. For i386, my patchset reports the same size of zones and
start_pfn on each node so there should be no difference in the end result
between my code and the arch-specific initialisation.
>>> If so, perhaps the bug is that the x86_64 code isn't doing that. And that
>> > x86 isn't doing it for some people either.
>> >
>>
>> I'm hoping in this case that having MAX_ACTIVE_REGIONS match E820MAX will
>> fix the issue on your machine.
>
> I expect it will.
>
> One does wonder whether it's worth all this fuss though. It's only a
> 24-byte structure and it's all thrown away in free_initmem(). One _could_
> just go and do
>
> #define MAX_ACTIVE_REGIONS 10000
>
> and be happy.
>
I could, but I thought I'd be shot for trying something like that. A fixed
value of 128 would cover the largest tables I'm aware of on all
architectures. Should I just set that fixed value?
>> I'm still confused why Christian's failed
>> to boot with the patch backed out though.
>
> He didn't get any "Too many memory regions" messages, so it's something
> different.
>
> Maybe he hit my off-by-one on his "ACPI data"?
>
Possibly but the off-by-one error for you was on x86 not x86_64 and I
suspect that highend_pfn was wrong in this case. I'll be checking tomorrow
where I can see an off-by-one error.
> hm, I didn't mention this in the earlier email. On my x86 I have
>
> BIOS-provided physical RAM map:
> BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
> BIOS-e820: 000000000009bc00 - 000000000009c000 (reserved)
> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 000000000ffc0000 (usable)
> BIOS-e820: 000000000ffc0000 - 000000000fff8000 (ACPI data)
> BIOS-e820: 000000000fff8000 - 0000000010000000 (ACPI NVS)
> BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
> BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
> BIOS-e820: 00000000ffb80000 - 00000000ffc00000 (reserved)
> BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
>
> I added some debug and saw that add_active_range() was getting a
> start_pfn=0 and an end_pfn which corresponds with 0x0fffc000. So my "ACPI
> NVS" is getting chopped off.
>
Yes. However, this just means that the memory for that the PFN range will
not be backed by memmap. This would only be a problem if free_bootmem() is
called on those range of pages. If that was happening, I would be
expecting oops early or bad_page reports during the boot process.
> If Christian is seeing a similar thing then his "ACPI data" will be getting
> only part-registered.
>
> I'd suggest that the next rev be liberal in its printking. This is the
> debug patch I used:
>
I also have an old debug patch that was very printk happy. I will dust it
off and add it with the additional information from your patch.
> mm/page_alloc.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff -puN mm/page_alloc.c~a mm/page_alloc.c
> --- devel/mm/page_alloc.c~a 2006-05-20 13:19:58.000000000 -0700
> +++ devel-akpm/mm/page_alloc.c 2006-05-20 13:20:42.000000000 -0700
> @@ -2463,22 +2463,36 @@ void __init add_active_range(unsigned in
> unsigned long end_pfn)
> {
> unsigned int i;
> - printk(KERN_DEBUG "Range (%d) %lu -> %lu\n", nid, start_pfn, end_pfn);
> +
> + printk("Range (nid %d) %lu -> %lu, max %d\n",
> + nid, start_pfn, end_pfn, MAX_ACTIVE_REGIONS - 1);
>
> /* Merge with existing active regions if possible */
> for (i = 0; early_node_map[i].end_pfn; i++) {
> - if (early_node_map[i].nid != nid)
> + printk("i=%d early_node_map[i].nid=%d "
> + "early_node_map[i].start_pfn=%lu "
> + "early_node_map[i].end_pfn=%lu",
> + i, early_node_map[i].nid,
> + early_node_map[i].start_pfn,
> + early_node_map[i].end_pfn);
> +
> + if (early_node_map[i].nid != nid) {
> + printk(" continue 1\n");
> continue;
> + }
>
> /* Skip if an existing region covers this new one */
> if (start_pfn >= early_node_map[i].start_pfn &&
> - end_pfn <= early_node_map[i].end_pfn)
> + end_pfn <= early_node_map[i].end_pfn) {
> + printk(" return 1\n");
> return;
> + }
>
> /* Merge forward if suitable */
> if (start_pfn <= early_node_map[i].end_pfn &&
> end_pfn > early_node_map[i].end_pfn) {
> early_node_map[i].end_pfn = end_pfn;
> + printk(" return 2\n");
> return;
> }
>
> @@ -2486,13 +2500,16 @@ void __init add_active_range(unsigned in
> if (start_pfn < early_node_map[i].end_pfn &&
> end_pfn >= early_node_map[i].start_pfn) {
> early_node_map[i].start_pfn = start_pfn;
> + printk(" return 3\n");
> return;
> }
> + printk("\n");
> }
>
> /* Leave last entry NULL, we use range.end_pfn to terminate the walk */
> if (i >= MAX_ACTIVE_REGIONS - 1) {
> - printk(KERN_ERR "Too many memory regions, truncating\n");
> + printk(KERN_ERR "More than %d memory regions, truncating\n",
> + MAX_ACTIVE_REGIONS - 1);
> return;
> }
>
> _
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-05-21 22:23 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-08 14:10 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V6 Mel Gorman
2006-05-08 14:10 ` [PATCH 1/6] Introduce mechanism for registering active regions of memory Mel Gorman
2006-05-08 14:11 ` [PATCH 2/6] Have Power use add_active_range() and free_area_init_nodes() Mel Gorman
2006-05-08 14:11 ` [PATCH 3/6] Have x86 use add_active_range() and free_area_init_nodes Mel Gorman
2006-05-08 14:11 ` [PATCH 4/6] Have x86_64 " Mel Gorman
2006-05-20 20:59 ` Andrew Morton
2006-05-20 21:27 ` Andi Kleen
2006-05-20 21:40 ` Andrew Morton
2006-05-20 22:17 ` Andi Kleen
2006-05-20 22:54 ` Andrew Morton
2006-05-21 16:20 ` Mel Gorman
2006-05-21 15:50 ` Mel Gorman
2006-05-21 19:08 ` Andrew Morton
2006-05-21 22:23 ` Mel Gorman [this message]
2006-05-23 18:01 ` Mel Gorman
2006-05-08 14:12 ` [PATCH 5/6] Have ia64 " Mel Gorman
2006-05-15 3:31 ` Andrew Morton
2006-05-15 8:21 ` Andy Whitcroft
2006-05-15 10:00 ` Nick Piggin
2006-05-15 10:19 ` Andy Whitcroft
2006-05-15 10:29 ` KAMEZAWA Hiroyuki
2006-05-15 10:47 ` KAMEZAWA Hiroyuki
2006-05-15 11:02 ` Andy Whitcroft
2006-05-16 0:31 ` Nick Piggin
2006-05-16 1:34 ` KAMEZAWA Hiroyuki
2006-05-16 2:11 ` Nick Piggin
2006-05-15 12:27 ` Mel Gorman
2006-05-15 22:44 ` Mel Gorman
2006-05-19 14:03 ` Mel Gorman
2006-05-19 14:23 ` Andy Whitcroft
2006-05-08 14:12 ` [PATCH 6/6] Break out memory initialisation code from page_alloc.c to mem_init.c Mel Gorman
2006-05-09 1:47 ` Nick Piggin
2006-05-09 8:24 ` Mel Gorman
2006-07-08 11:10 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V8 Mel Gorman
2006-07-08 11:12 ` [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes Mel Gorman
2006-08-21 13:45 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V9 Mel Gorman
2006-08-21 13:46 ` [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes Mel Gorman
2006-08-30 20:57 ` Keith Mannthey
2006-08-31 15:49 ` Mel Gorman
2006-08-31 16:25 ` Mika Penttilä
2006-08-31 17:01 ` Mel Gorman
2006-08-31 17:40 ` Mika Penttilä
2006-08-31 17:52 ` Keith Mannthey
2006-08-31 18:40 ` Mel Gorman
2006-09-01 3:08 ` Keith Mannthey
2006-09-01 8:33 ` Mel Gorman
2006-09-01 8:46 ` Mika Penttilä
2006-09-04 15:36 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0605212219540.27379@skynet.skynet.ie \
--to=mel@csn.ul.ie \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=bob.picco@hp.com \
--cc=davej@codemonkey.org.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox