* [RFC][PATCH] vmemmap on sparsemem v2 [1/5] generic vmemmap on sparsemem
2006-12-05 12:45 [RFC][PATCH] vmemmap on sparsemem v2 KAMEZAWA Hiroyuki
@ 2006-12-05 12:49 ` KAMEZAWA Hiroyuki
2006-12-06 18:13 ` Heiko Carstens
2006-12-08 3:06 ` KAMEZAWA Hiroyuki
2006-12-05 12:53 ` [RFC][PATCH] vmemmap on sparsemem v2 [2/5] memory hotplug support KAMEZAWA Hiroyuki
` (4 subsequent siblings)
5 siblings, 2 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-12-05 12:49 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, clameter, apw
This patch implements arch-independent-part of virtuam mem_map for sparsemem.
memory-hotplug is not supproted. (supported by later patch.)
Declarations which an arch has to add to use vmem_map/sparsemem is
* declare 'struct page *vmem_map or vmem_map[] and setup its value.
* set ARCH_SPARSEMEM_VMEMMAP in Kconfig
maybe asm/sparsemem.h is suitable as ia64 patch(later) does.
We can assume that total size of mem_map per section is aligned to PAGE_SIZE.
By this, pfn_valid()(of sparsemem) works fine.
This code has its own page-mapping routine just because it has to be called
before page struct is available.
Consideration:
I know some people tries to use large page for vmem_map. It seems attractive
but this patch doesn't support hooks for that.
Maybe rewriting map_virtual_mem_map() is enough. (if you doesn't consider
memory hotplug.)
IMO, generic interface to map large pages in the kernel should be discussed
before doing such special hack.
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
include/linux/mmzone.h | 8 +++
mm/Kconfig | 9 ++++
mm/sparse.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++---
3 files changed, 113 insertions(+), 5 deletions(-)
Index: devel-2.6.19-rc6-mm2/mm/Kconfig
===================================================================
--- devel-2.6.19-rc6-mm2.orig/mm/Kconfig 2006-12-05 17:24:30.000000000 +0900
+++ devel-2.6.19-rc6-mm2/mm/Kconfig 2006-12-05 17:24:58.000000000 +0900
@@ -112,6 +112,15 @@
def_bool y
depends on SPARSEMEM && !SPARSEMEM_STATIC
+config SPARSEMEM_VMEMMAP
+ bool "virtual memmap support for sparsemem"
+ depends on SPARSEMEM && !SPARSEMEM_STATIC && ARCH_SPARSEMEM_VMEMMAP
+ help
+ If selected, sparsemem uses virtually contiguous address for mem_map.
+ Some functions of sparsemem (pfn_to_page/page_to_pfn) can be very
+ very simple and fast. But this will consume huge amount of virtual
+ address space.
+
# eventually, we can have this option just 'select SPARSEMEM'
config MEMORY_HOTPLUG
bool "Allow for memory hot-add"
Index: devel-2.6.19-rc6-mm2/include/linux/mmzone.h
===================================================================
--- devel-2.6.19-rc6-mm2.orig/include/linux/mmzone.h 2006-12-05 17:24:28.000000000 +0900
+++ devel-2.6.19-rc6-mm2/include/linux/mmzone.h 2006-12-05 19:53:41.000000000 +0900
@@ -714,12 +714,23 @@
#define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1))
#define SECTION_NID_SHIFT 2
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+/*
+ * sparse_vmem_map_start is defined by each arch.
+ * vmem_map is declared by each arch.
+ */
+static inline struct page *__section_mem_map_addr(struct mem_section *section)
+{
+ return vmem_map;
+}
+#else
static inline struct page *__section_mem_map_addr(struct mem_section *section)
{
unsigned long map = section->section_mem_map;
map &= SECTION_MAP_MASK;
return (struct page *)map;
}
+#endif
static inline int valid_section(struct mem_section *section)
{
Index: devel-2.6.19-rc6-mm2/mm/sparse.c
===================================================================
--- devel-2.6.19-rc6-mm2.orig/mm/sparse.c 2006-12-05 17:24:30.000000000 +0900
+++ devel-2.6.19-rc6-mm2/mm/sparse.c 2006-12-05 19:53:13.000000000 +0900
@@ -9,6 +9,7 @@
#include <linux/spinlock.h>
#include <linux/vmalloc.h>
#include <asm/dma.h>
+#include <asm/pgalloc.h>
/*
* Permanent SPARSEMEM data:
@@ -99,6 +100,105 @@
}
#endif
+
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+
+static void* __init pte_alloc_vmem_map(int node)
+{
+ return alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE);
+}
+
+/*
+ * We can expect mem_map in section is always contigous.
+ */
+static unsigned long
+__init sparse_phys_mem_map_get(unsigned long section,
+ unsigned long vmap,
+ int node)
+{
+ struct mem_section *ms = __nr_to_section(section);
+ unsigned long map = ms->section_mem_map & SECTION_MAP_MASK;
+ unsigned long vmap_start;
+
+ vmap_start = (unsigned long)pfn_to_page(section_nr_to_pfn(section));
+
+ if (system_state == SYSTEM_BOOTING) {
+ unsigned long offset;
+ map = (unsigned long)((struct page*)(map) +
+ section_nr_to_pfn(section));
+ offset = (vmap - vmap_start) >> PAGE_SHIFT;
+ map = __pa(map);
+ return (map >> PAGE_SHIFT) + offset;
+ }
+ BUG(); /* handled by memory hotplug */
+}
+
+/*
+ * map_pos(section,offset) returns pfn of physical address of mem_map
+ * in section at index. (see boot_memmap_pos()).
+ * Returns 1 if succeed.
+ */
+static int __meminit map_virtual_mem_map(unsigned long section, int node)
+{
+ unsigned long vmap_start, vmap_end, vmap;
+ void *pg;
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ vmap_start = (unsigned long)pfn_to_page(section_nr_to_pfn(section));
+ vmap_end = vmap_start + PAGES_PER_SECTION * sizeof(struct page);
+
+ for (vmap = vmap_start;
+ vmap != vmap_end;
+ vmap += PAGE_SIZE)
+ {
+ pgd = pgd_offset_k(vmap);
+ if (pgd_none(*pgd)) {
+ pg = pte_alloc_vmem_map(node);
+ if (!pg)
+ goto error_out;
+ pgd_populate(&init_mm, pgd, pg);
+ }
+ pud = pud_offset(pgd, vmap);
+ if (pud_none(*pud)) {
+ pg = pte_alloc_vmem_map(node);
+ if (!pg)
+ goto error_out;
+ pud_populate(&init_mm, pud, pg);
+ }
+ pmd = pmd_offset(pud, vmap);
+ if (pmd_none(*pmd)) {
+ pg = pte_alloc_vmem_map(node);
+ if (!pg)
+ goto error_out;
+ pmd_populate_kernel(&init_mm, pmd, pg);
+ }
+ pte = pte_offset_kernel(pmd, vmap);
+ if (pte_none(*pte)) {
+ unsigned long pfn;
+ pfn = sparse_phys_mem_map_get(section, vmap, node);
+ if (!pfn)
+ goto error_out;
+ set_pte(pte, pfn_pte(pfn, PAGE_KERNEL));
+ }
+ }
+ flush_cache_vmap(vmap_start, vmap_end);
+ return 1;
+error_out:
+ return -ENOMEM;
+}
+
+#else
+
+static inline int map_virtual_mem_map(int section, int node)
+{
+ return 1;
+}
+
+#endif
+
/*
* Although written for the SPARSEMEM_EXTREME case, this happens
* to also work for the flat array case becase
@@ -198,15 +298,14 @@
}
static int sparse_init_one_section(struct mem_section *ms,
- unsigned long pnum, struct page *mem_map)
+ unsigned long pnum, struct page *mem_map, int nid)
{
if (!valid_section(ms))
return -EINVAL;
ms->section_mem_map &= ~SECTION_MAP_MASK;
ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum);
-
- return 1;
+ return map_virtual_mem_map(pnum, nid);
}
static struct page *sparse_early_mem_map_alloc(unsigned long pnum)
@@ -284,7 +383,8 @@
map = sparse_early_mem_map_alloc(pnum);
if (!map)
continue;
- sparse_init_one_section(__nr_to_section(pnum), pnum, map);
+ sparse_init_one_section(__nr_to_section(pnum), pnum, map,
+ sparse_early_nid(__nr_to_section(pnum)));
}
}
@@ -319,7 +419,7 @@
}
ms->section_mem_map |= SECTION_MARKED_PRESENT;
- ret = sparse_init_one_section(ms, section_nr, memmap);
+ ret = sparse_init_one_section(ms, section_nr, memmap, pgdat->node_id);
out:
pgdat_resize_unlock(pgdat, &flags);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [RFC][PATCH] vmemmap on sparsemem v2 [1/5] generic vmemmap on sparsemem
2006-12-05 12:49 ` [RFC][PATCH] vmemmap on sparsemem v2 [1/5] generic vmemmap on sparsemem KAMEZAWA Hiroyuki
@ 2006-12-06 18:13 ` Heiko Carstens
2006-12-06 18:17 ` Christoph Lameter
2006-12-08 3:06 ` KAMEZAWA Hiroyuki
1 sibling, 1 reply; 20+ messages in thread
From: Heiko Carstens @ 2006-12-06 18:13 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, clameter, apw
> We can assume that total size of mem_map per section is aligned to PAGE_SIZE.
[...]
> +static int __meminit map_virtual_mem_map(unsigned long section, int node)
> +{
> + unsigned long vmap_start, vmap_end, vmap;
> + void *pg;
> + pgd_t *pgd;
> + pud_t *pud;
> + pmd_t *pmd;
> + pte_t *pte;
> +
> + vmap_start = (unsigned long)pfn_to_page(section_nr_to_pfn(section));
> + vmap_end = vmap_start + PAGES_PER_SECTION * sizeof(struct page);
> +
> + for (vmap = vmap_start;
> + vmap != vmap_end;
> + vmap += PAGE_SIZE)
> + {
Hmm.. maybe I'm just too tired. But why does this work? Why is vmap_start
PAGE_SIZE aligned and why is vmap_end PAGE_SIZE aligned too?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [RFC][PATCH] vmemmap on sparsemem v2 [1/5] generic vmemmap on sparsemem
2006-12-06 18:13 ` Heiko Carstens
@ 2006-12-06 18:17 ` Christoph Lameter
2006-12-07 0:20 ` KAMEZAWA Hiroyuki
2006-12-07 10:06 ` Heiko Carstens
0 siblings, 2 replies; 20+ messages in thread
From: Christoph Lameter @ 2006-12-06 18:17 UTC (permalink / raw)
To: Heiko Carstens; +Cc: KAMEZAWA Hiroyuki, linux-mm, clameter, apw
On Wed, 6 Dec 2006, Heiko Carstens wrote:
> > + vmap_start = (unsigned long)pfn_to_page(section_nr_to_pfn(section));
> > + vmap_end = vmap_start + PAGES_PER_SECTION * sizeof(struct page);
> > +
> > + for (vmap = vmap_start;
> > + vmap != vmap_end;
> > + vmap += PAGE_SIZE)
> > + {
>
> Hmm.. maybe I'm just too tired. But why does this work? Why is vmap_start
> PAGE_SIZE aligned and why is vmap_end PAGE_SIZE aligned too?
vmap_start is page aligned because pfn_to_page returns a page address.
Pages are page aligned.
vmap_end is only page aligned if sizeof(struct page) and PAGES_PER_SECTION
play nicely together. Which may not be the case on 64 bit platforms where
sizeof(struct page) is not a power of two.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [RFC][PATCH] vmemmap on sparsemem v2 [1/5] generic vmemmap on sparsemem
2006-12-06 18:17 ` Christoph Lameter
@ 2006-12-07 0:20 ` KAMEZAWA Hiroyuki
2006-12-07 0:20 ` Christoph Lameter
2006-12-07 10:11 ` Heiko Carstens
2006-12-07 10:06 ` Heiko Carstens
1 sibling, 2 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-12-07 0:20 UTC (permalink / raw)
To: Christoph Lameter; +Cc: heiko.carstens, linux-mm, clameter, apw
On Wed, 6 Dec 2006 10:17:04 -0800 (PST)
Christoph Lameter <clameter@sgi.com> wrote:
> On Wed, 6 Dec 2006, Heiko Carstens wrote:
>
> > > + vmap_start = (unsigned long)pfn_to_page(section_nr_to_pfn(section));
> > > + vmap_end = vmap_start + PAGES_PER_SECTION * sizeof(struct page);
> > > +
> > > + for (vmap = vmap_start;
> > > + vmap != vmap_end;
> > > + vmap += PAGE_SIZE)
> > > + {
> >
> > Hmm.. maybe I'm just too tired. But why does this work? Why is vmap_start
> > PAGE_SIZE aligned and why is vmap_end PAGE_SIZE aligned too?
>
> vmap_start is page aligned because pfn_to_page returns a page address.
> Pages are page aligned.
>
> vmap_end is only page aligned if sizeof(struct page) and PAGES_PER_SECTION
> play nicely together. Which may not be the case on 64 bit platforms where
> sizeof(struct page) is not a power of two.
>
Now, (for example ia64) sizeof(struct page)=56 and PAGES_PER_SECTION=65536,
Then, sizeof(struct page) * PAGES_PER_SECTION is page-aligned.(16kbytes pages.)
This trick is very useful. But yes, if PAGES_PER_SECTION goes down to very small size
this is not true. I hope every arch keeps this assumption.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [RFC][PATCH] vmemmap on sparsemem v2 [1/5] generic vmemmap on sparsemem
2006-12-07 0:20 ` KAMEZAWA Hiroyuki
@ 2006-12-07 0:20 ` Christoph Lameter
2006-12-07 10:11 ` Heiko Carstens
1 sibling, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2006-12-07 0:20 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: heiko.carstens, linux-mm, clameter, apw
On Thu, 7 Dec 2006, KAMEZAWA Hiroyuki wrote:
> Now, (for example ia64) sizeof(struct page)=56 and PAGES_PER_SECTION=65536,
> Then, sizeof(struct page) * PAGES_PER_SECTION is page-aligned.(16kbytes pages.)
Ahhh. Neat trick.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC][PATCH] vmemmap on sparsemem v2 [1/5] generic vmemmap on sparsemem
2006-12-07 0:20 ` KAMEZAWA Hiroyuki
2006-12-07 0:20 ` Christoph Lameter
@ 2006-12-07 10:11 ` Heiko Carstens
2006-12-07 10:50 ` KAMEZAWA Hiroyuki
1 sibling, 1 reply; 20+ messages in thread
From: Heiko Carstens @ 2006-12-07 10:11 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: Christoph Lameter, linux-mm, clameter, apw
> > vmap_end is only page aligned if sizeof(struct page) and PAGES_PER_SECTION
> > play nicely together. Which may not be the case on 64 bit platforms where
> > sizeof(struct page) is not a power of two.
> >
> Now, (for example ia64) sizeof(struct page)=56 and PAGES_PER_SECTION=65536,
> Then, sizeof(struct page) * PAGES_PER_SECTION is page-aligned.(16kbytes pages.)
sizeof(struct page) depends also on at least two CONFIG options. I don't
think it's a good idea to assume that everything is page aligned, just
because it works right now and only with certain kernel configurations...
At least the kernel build should fail if your assumptions are not true
anymore.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC][PATCH] vmemmap on sparsemem v2 [1/5] generic vmemmap on sparsemem
2006-12-07 10:11 ` Heiko Carstens
@ 2006-12-07 10:50 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-12-07 10:50 UTC (permalink / raw)
To: Heiko Carstens; +Cc: clameter, linux-mm, clameter, apw
On Thu, 7 Dec 2006 11:11:56 +0100
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > > vmap_end is only page aligned if sizeof(struct page) and PAGES_PER_SECTION
> > > play nicely together. Which may not be the case on 64 bit platforms where
> > > sizeof(struct page) is not a power of two.
> > >
> > Now, (for example ia64) sizeof(struct page)=56 and PAGES_PER_SECTION=65536,
> > Then, sizeof(struct page) * PAGES_PER_SECTION is page-aligned.(16kbytes pages.)
>
> sizeof(struct page) depends also on at least two CONFIG options. I don't
> think it's a good idea to assume that everything is page aligned, just
> because it works right now and only with certain kernel configurations...
> At least the kernel build should fail if your assumptions are not true
> anymore.
>
I'll add #error and check it. thanks.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC][PATCH] vmemmap on sparsemem v2 [1/5] generic vmemmap on sparsemem
2006-12-06 18:17 ` Christoph Lameter
2006-12-07 0:20 ` KAMEZAWA Hiroyuki
@ 2006-12-07 10:06 ` Heiko Carstens
2006-12-07 10:17 ` KAMEZAWA Hiroyuki
1 sibling, 1 reply; 20+ messages in thread
From: Heiko Carstens @ 2006-12-07 10:06 UTC (permalink / raw)
To: Christoph Lameter; +Cc: KAMEZAWA Hiroyuki, linux-mm, clameter, apw
On Wed, Dec 06, 2006 at 10:17:04AM -0800, Christoph Lameter wrote:
> On Wed, 6 Dec 2006, Heiko Carstens wrote:
>
> > > + vmap_start = (unsigned long)pfn_to_page(section_nr_to_pfn(section));
> > > + vmap_end = vmap_start + PAGES_PER_SECTION * sizeof(struct page);
> > > +
> > > + for (vmap = vmap_start;
> > > + vmap != vmap_end;
> > > + vmap += PAGE_SIZE)
> > > + {
> >
> > Hmm.. maybe I'm just too tired. But why does this work? Why is vmap_start
> > PAGE_SIZE aligned and why is vmap_end PAGE_SIZE aligned too?
>
> vmap_start is page aligned because pfn_to_page returns a page address.
> Pages are page aligned.
I must be dreaming... I always though pfn_to_page return the address to
the beloging 'struct page'... and indeed it does. So there is nothing
that guarantees that this is page aligned.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [RFC][PATCH] vmemmap on sparsemem v2 [1/5] generic vmemmap on sparsemem
2006-12-07 10:06 ` Heiko Carstens
@ 2006-12-07 10:17 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-12-07 10:17 UTC (permalink / raw)
To: Heiko Carstens; +Cc: clameter, linux-mm, clameter, apw
On Thu, 7 Dec 2006 11:06:59 +0100
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> On Wed, Dec 06, 2006 at 10:17:04AM -0800, Christoph Lameter wrote:
> > On Wed, 6 Dec 2006, Heiko Carstens wrote:
> >
> > > > + vmap_start = (unsigned long)pfn_to_page(section_nr_to_pfn(section));
> > > > + vmap_end = vmap_start + PAGES_PER_SECTION * sizeof(struct page);
> > > > +
> > > > + for (vmap = vmap_start;
> > > > + vmap != vmap_end;
> > > > + vmap += PAGE_SIZE)
> > > > + {
> > >
> > > Hmm.. maybe I'm just too tired. But why does this work? Why is vmap_start
> > > PAGE_SIZE aligned and why is vmap_end PAGE_SIZE aligned too?
> >
> > vmap_start is page aligned because pfn_to_page returns a page address.
> > Pages are page aligned.
>
> I must be dreaming... I always though pfn_to_page return the address to
> the beloging 'struct page'... and indeed it does. So there is nothing
> that guarantees that this is page aligned.
>
I assumes that page struct of sparsemem's each section's first page is always
aligned to PAGE_SIZE. so this is safe.
ia64 example:
sizeof(struct page) = 56 bytes
PAGES_PER_SECTION = 65536
PAGE_SIZE = 16384
56 * 65536 % 16384 = 0.
- Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC][PATCH] vmemmap on sparsemem v2 [1/5] generic vmemmap on sparsemem
2006-12-05 12:49 ` [RFC][PATCH] vmemmap on sparsemem v2 [1/5] generic vmemmap on sparsemem KAMEZAWA Hiroyuki
2006-12-06 18:13 ` Heiko Carstens
@ 2006-12-08 3:06 ` KAMEZAWA Hiroyuki
1 sibling, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-12-08 3:06 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, clameter, apw
On Tue, 5 Dec 2006 21:49:02 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +/*
> + * sparse_vmem_map_start is defined by each arch.
> + * vmem_map is declared by each arch.
> + */
> +static inline struct page *__section_mem_map_addr(struct mem_section *section)
> +{
> + return vmem_map;
> +}
> +#else
I confirmed that this style add one memory access (ld). I'll go back to
#define pfn_to_page(pfn) (mem_map + pfn)
style.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC][PATCH] vmemmap on sparsemem v2 [2/5] memory hotplug support
2006-12-05 12:45 [RFC][PATCH] vmemmap on sparsemem v2 KAMEZAWA Hiroyuki
2006-12-05 12:49 ` [RFC][PATCH] vmemmap on sparsemem v2 [1/5] generic vmemmap on sparsemem KAMEZAWA Hiroyuki
@ 2006-12-05 12:53 ` KAMEZAWA Hiroyuki
2006-12-05 12:59 ` [RFC][PATCH] vmemmap on sparsemem v2 [3/5] ia64 vmemamp on sparsemem KAMEZAWA Hiroyuki
` (3 subsequent siblings)
5 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-12-05 12:53 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, clameter, apw
This patch is for memory hotplug support with sparsemem_vmemmap.
Implements on-demand mem_map allocation and unmap routine (used at
rollback from allocation failure now)
Not so complicated.
This patch defines 'only for vmem_map unmap routine. looks not good.
But there is no routine to free mapped page at unmap (for kernel).
And I'm thinking of allocating mem_map for hot-added section from itself,
some special page.
When I find cleaner way, I'll fix this.
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Index: devel-2.6.19-rc6-mm2/mm/sparse.c
===================================================================
--- devel-2.6.19-rc6-mm2.orig/mm/sparse.c 2006-12-05 19:45:52.000000000 +0900
+++ devel-2.6.19-rc6-mm2/mm/sparse.c 2006-12-05 19:48:48.000000000 +0900
@@ -10,6 +10,7 @@
#include <linux/vmalloc.h>
#include <asm/dma.h>
#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
/*
* Permanent SPARSEMEM data:
@@ -103,22 +104,30 @@
#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static void* __init pte_alloc_vmem_map(int node)
+static void* __meminit pte_alloc_vmem_map(int node)
{
- return alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE);
+ struct page *page;
+ if (system_state == SYSTEM_BOOTING)
+ return alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE);
+
+ page = alloc_pages_node(node, GFP_KERNEL|__GFP_ZERO, 0);
+ if (!page)
+ return NULL;
+ return page_address(page);
}
/*
* We can expect mem_map in section is always contigous.
*/
static unsigned long
-__init sparse_phys_mem_map_get(unsigned long section,
- unsigned long vmap,
- int node)
+__meminit sparse_phys_mem_map_get(unsigned long section,
+ unsigned long vmap,
+ int node)
{
struct mem_section *ms = __nr_to_section(section);
unsigned long map = ms->section_mem_map & SECTION_MAP_MASK;
unsigned long vmap_start;
+ struct page *page;
vmap_start = (unsigned long)pfn_to_page(section_nr_to_pfn(section));
@@ -130,7 +139,11 @@
map = __pa(map);
return (map >> PAGE_SHIFT) + offset;
}
- BUG(); /* handled by memory hotplug */
+
+ page = alloc_pages_node(node, GFP_KERNEL|__GFP_ZERO, 0);
+ if (!page)
+ return 0;
+ return page_to_pfn(page);
}
/*
@@ -190,6 +203,81 @@
return -ENOMEM;
}
+/*
+ * This function does the same ops as vumamp() except for freeing pages.
+ */
+static void
+unmap_virtual_mem_map_pte(pmd_t *pmd, unsigned long addr, unsigned long end)
+{
+ pte_t *pte;
+ unsigned long pfn;
+ struct page *page;
+ pte = pte_offset_kernel(pmd, addr);
+ do {
+ WARN_ON(!pte_none(*pte) && !pte_present(*pte));
+ pfn = pte_pfn(*pte);
+ page = pfn_to_page(pfn);
+ if (!PageReserved(page)) {
+ pte_clear(&init_mm, addr, pte);
+ __free_page(page);
+ } else {
+ /* allocated at boot, never reach here until
+ memory hot-unplug is implemnted. */
+ BUG();
+ }
+ } while(pte++, addr += PAGE_SIZE, addr != end);
+}
+
+static void
+unmap_virutal_mem_map_pmd(pud_t *pud, unsigned long addr, unsigned long end)
+{
+ pmd_t *pmd;
+ unsigned long next;
+
+ pmd = pmd_offset(pud, addr);
+ do {
+ next = pmd_addr_end(addr, end);
+ if (pmd_none_or_clear_bad(pmd))
+ continue;
+ unmap_virtual_mem_map_pte(pmd, addr, next);
+ } while (pmd++, addr = next, addr != end);
+}
+static void
+unmap_virtual_mem_map_pud(pgd_t *pgd, unsigned long addr,unsigned long end)
+{
+ pud_t *pud;
+ unsigned long next;
+ pud = pud_offset(pgd, addr);
+ do {
+ next = pud_addr_end(addr, end);
+ if (pud_none_or_clear_bad(pud))
+ continue;
+ unmap_virutal_mem_map_pmd(pud, addr, next);
+ } while (pud++, addr = next, addr != end);
+}
+
+static void unmap_virtual_mem_map(int section)
+{
+ unsigned long start_addr, addr, end_addr, next;
+ unsigned long size = PAGES_PER_SECTION * sizeof(struct page);
+ pgd_t *pgd;
+ start_addr = (unsigned long)pfn_to_page(section_nr_to_pfn(section));
+ end_addr = start_addr + size;
+ addr = start_addr;
+
+ pgd = pgd_offset_k(start_addr);
+ flush_cache_vunmap(start_addr, end_addr);
+ do {
+ next = pgd_addr_end(addr, end_addr);
+ if (pgd_none_or_clear_bad(pgd))
+ continue;
+ unmap_virtual_mem_map_pud(pgd, addr, next);
+ }while(pgd++, addr = next, addr != end_addr);
+ flush_tlb_kernel_range((unsigned long)start_addr, end_addr);
+
+ return;
+}
+
#else
static inline int map_virtual_mem_map(int section, int node)
@@ -328,6 +416,18 @@
return NULL;
}
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
+{
+ /* we allocate mem_map later */
+ return NULL;
+}
+static void __kfree_section_memmap(int section_nr,
+ struct page *memmap, unsigned long nr_pages)
+{
+ unmap_virtual_mem_map(section_nr);
+}
+#else
static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
{
struct page *page, *ret;
@@ -358,7 +458,8 @@
return 0;
}
-static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
+static void __kfree_section_memmap(int section_nr,
+ struct page *memmap, unsigned long nr_pages)
{
if (vaddr_in_vmalloc_area(memmap))
vfree(memmap);
@@ -366,7 +467,7 @@
free_pages((unsigned long)memmap,
get_order(sizeof(struct page) * nr_pages));
}
-
+#endif
/*
* Allocate the accumulated non-linear sections, allocate a mem_map
* for each and record the physical to section mapping.
@@ -424,6 +525,6 @@
out:
pgdat_resize_unlock(pgdat, &flags);
if (ret <= 0)
- __kfree_section_memmap(memmap, nr_pages);
+ __kfree_section_memmap(section_nr, memmap, nr_pages);
return ret;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* [RFC][PATCH] vmemmap on sparsemem v2 [3/5] ia64 vmemamp on sparsemem
2006-12-05 12:45 [RFC][PATCH] vmemmap on sparsemem v2 KAMEZAWA Hiroyuki
2006-12-05 12:49 ` [RFC][PATCH] vmemmap on sparsemem v2 [1/5] generic vmemmap on sparsemem KAMEZAWA Hiroyuki
2006-12-05 12:53 ` [RFC][PATCH] vmemmap on sparsemem v2 [2/5] memory hotplug support KAMEZAWA Hiroyuki
@ 2006-12-05 12:59 ` KAMEZAWA Hiroyuki
2006-12-08 1:09 ` KAMEZAWA Hiroyuki
2006-12-05 13:09 ` [RFC][PATCH] vmemmap on sparsemem v2 [4/5] optimized pfn_valid KAMEZAWA Hiroyuki
` (2 subsequent siblings)
5 siblings, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-12-05 12:59 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, clameter, apw
This patch declares some definition for ia64/vmem_map/sparsemem.
Because ia64 uses SPARSEMEM_EXTREME,the benefit of vmem_map is big.
The address of vmem_map is defined as fixed value.
Important definitions are in asm/sparsemem.h
I thank Christoph-san for his help.
Signed-Off-By: KAMEZAWA Hiruyoki <kamezawa.hiroyu@jp.fujitsu.com>
arch/ia64/Kconfig | 4 ++++
arch/ia64/kernel/vmlinux.lds | 5 ++++-
arch/ia64/mm/init.c | 4 ++++
include/asm-ia64/pgtable.h | 18 +++++++++++++-----
include/asm-ia64/sparsemem.h | 9 +++++++++
5 files changed, 34 insertions(+), 6 deletions(-)
Index: devel-2.6.19-rc6-mm2/include/asm-ia64/pgtable.h
===================================================================
--- devel-2.6.19-rc6-mm2.orig/include/asm-ia64/pgtable.h 2006-12-05 20:20:47.000000000 +0900
+++ devel-2.6.19-rc6-mm2/include/asm-ia64/pgtable.h 2006-12-05 20:21:05.000000000 +0900
@@ -230,12 +230,20 @@
#define set_pte(ptep, pteval) (*(ptep) = (pteval))
#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
+#if defined(CONFIG_SPARSEMEM_VMEMMAP)
+/* sparsemem always allocate maximum size virtual mem map */
+#define VMALLOC_START (VIRTUAL_MEM_MAP + VIRTUAL_MEM_MAP_SIZE)
+#define VMALLOC_END (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
+
+#elif defined(CONFIG_VIRTUAL_MEMMAP)
+ /* for flatmem/discontigmem sizeof vmem_map depends on mem size.*/
+#define VMALLOC_START (RGN_BASE(RGN_GATE) + 0x200000000UL)
+#define VMALLOC_END_INIT (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
+#define VMALLOC_END vmalloc_end
+extern unsigned long vmalloc_end;
+
+#elif /* don't use any kind of VIRTUAL_MEMMAP */
#define VMALLOC_START (RGN_BASE(RGN_GATE) + 0x200000000UL)
-#ifdef CONFIG_VIRTUAL_MEM_MAP
-# define VMALLOC_END_INIT (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
-# define VMALLOC_END vmalloc_end
- extern unsigned long vmalloc_end;
-#else
# define VMALLOC_END (RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
#endif
Index: devel-2.6.19-rc6-mm2/include/asm-ia64/sparsemem.h
===================================================================
--- devel-2.6.19-rc6-mm2.orig/include/asm-ia64/sparsemem.h 2006-12-05 20:20:47.000000000 +0900
+++ devel-2.6.19-rc6-mm2/include/asm-ia64/sparsemem.h 2006-12-05 21:07:07.000000000 +0900
@@ -16,5 +16,17 @@
#endif
#endif
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+
+
+#define VIRTUAL_MEM_MAP (RGN_BASE(RGN_GATE) + 0x200000000)
+#define VIRTUAL_MEM_MAP_SIZE ((1UL << (MAX_PHYSMEM_BITS - PAGE_SHIFT)) * sizeof(struct page))
+
+/* fixed at compile time */
+#ifndef __ASSEMBLY__
+extern struct page vmem_map[];
+#endif
+
+#endif
#endif /* CONFIG_SPARSEMEM */
#endif /* _ASM_IA64_SPARSEMEM_H */
Index: devel-2.6.19-rc6-mm2/arch/ia64/Kconfig
===================================================================
--- devel-2.6.19-rc6-mm2.orig/arch/ia64/Kconfig 2006-12-05 20:20:47.000000000 +0900
+++ devel-2.6.19-rc6-mm2/arch/ia64/Kconfig 2006-12-05 21:25:30.000000000 +0900
@@ -345,6 +345,10 @@
def_bool y
depends on ARCH_DISCONTIGMEM_ENABLE
+config ARCH_SPARSEMEM_VMEMMAP
+ def_bool y
+ depends on ARCH_SPARSEMEM_ENABLE
+
config ARCH_DISCONTIGMEM_DEFAULT
def_bool y if (IA64_SGI_SN2 || IA64_GENERIC || IA64_HP_ZX1 || IA64_HP_ZX1_SWIOTLB)
depends on ARCH_DISCONTIGMEM_ENABLE
Index: devel-2.6.19-rc6-mm2/arch/ia64/mm/init.c
===================================================================
--- devel-2.6.19-rc6-mm2.orig/arch/ia64/mm/init.c 2006-12-05 20:20:47.000000000 +0900
+++ devel-2.6.19-rc6-mm2/arch/ia64/mm/init.c 2006-12-05 20:21:05.000000000 +0900
@@ -44,6 +44,9 @@
extern void ia64_tlb_init (void);
unsigned long MAX_DMA_ADDRESS = PAGE_OFFSET + 0x100000000UL;
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+EXPORT_SYMBOL(vmem_map); /*has fixed value */
+#endif
#ifdef CONFIG_VIRTUAL_MEM_MAP
unsigned long vmalloc_end = VMALLOC_END_INIT;
@@ -52,6 +55,7 @@
EXPORT_SYMBOL(vmem_map);
#endif
+
struct page *zero_page_memmap_ptr; /* map entry for zero page */
EXPORT_SYMBOL(zero_page_memmap_ptr);
Index: devel-2.6.19-rc6-mm2/arch/ia64/kernel/vmlinux.lds.S
===================================================================
--- devel-2.6.19-rc6-mm2.orig/arch/ia64/kernel/vmlinux.lds.S 2006-12-04 14:30:03.000000000 +0900
+++ devel-2.6.19-rc6-mm2/arch/ia64/kernel/vmlinux.lds.S 2006-12-05 20:32:15.000000000 +0900
@@ -2,6 +2,7 @@
#include <asm/cache.h>
#include <asm/ptrace.h>
#include <asm/system.h>
+#include <asm/sparsemem.h>
#include <asm/pgtable.h>
#define LOAD_OFFSET (KERNEL_START - KERNEL_TR_PAGE_SIZE)
@@ -34,6 +35,7 @@
v = PAGE_OFFSET; /* this symbol is here to make debugging easier... */
phys_start = _start - LOAD_OFFSET;
+ vmem_map = VIRTUAL_MEM_MAP;
code : { } :code
. = KERNEL_START;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [RFC][PATCH] vmemmap on sparsemem v2 [3/5] ia64 vmemamp on sparsemem
2006-12-05 12:59 ` [RFC][PATCH] vmemmap on sparsemem v2 [3/5] ia64 vmemamp on sparsemem KAMEZAWA Hiroyuki
@ 2006-12-08 1:09 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-12-08 1:09 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, clameter, apw
On Tue, 5 Dec 2006 21:59:05 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> +/* fixed at compile time */
> +#ifndef __ASSEMBLY__
> +extern struct page vmem_map[];
> +#endif
> +
I'm sorry that this cannot be compiled by gcc-4.0 because 'struct page' is not
declared. I'll move this or use pointer struct page *vmem_map.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC][PATCH] vmemmap on sparsemem v2 [4/5] optimized pfn_valid
2006-12-05 12:45 [RFC][PATCH] vmemmap on sparsemem v2 KAMEZAWA Hiroyuki
` (2 preceding siblings ...)
2006-12-05 12:59 ` [RFC][PATCH] vmemmap on sparsemem v2 [3/5] ia64 vmemamp on sparsemem KAMEZAWA Hiroyuki
@ 2006-12-05 13:09 ` KAMEZAWA Hiroyuki
2006-12-05 13:10 ` [RFC][PATCH] vmemmap on sparsemem v2 [5/5] optimzied pfn_valid support for ia64 KAMEZAWA Hiroyuki
2006-12-10 13:37 ` [RFC][PATCH] vmemmap on sparsemem v2 Andy Whitcroft
5 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-12-05 13:09 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, clameter, apw
This implements pfn_valid() as ia64's vmem_map does.
This eliminates access to mem_section[] array by usual ops.
Because vmemmap on sparsemem is aligned. access check function can be easier
than ia64's.
Signed-Off-By: KAMEZAWA Hiruyoki <kamezawa.hiroyu@jp.fujitsu.com>
include/linux/mmzone.h | 14 ++++++++++++++
mm/sparse.c | 23 +++++++++++++++++++++++
2 files changed, 37 insertions(+)
Index: devel-2.6.19-rc6-mm2/include/linux/mmzone.h
===================================================================
--- devel-2.6.19-rc6-mm2.orig/include/linux/mmzone.h 2006-12-05 21:25:43.000000000 +0900
+++ devel-2.6.19-rc6-mm2/include/linux/mmzone.h 2006-12-05 21:45:21.000000000 +0900
@@ -752,12 +752,27 @@
return __nr_to_section(pfn_to_section_nr(pfn));
}
+#if defined(SPARSEMEM_VMEM_MAP) && defined(CONFIG_USE_OPT_PFN_VALID)
+/*
+ * Uses hardware assist instead of mem_section[] table walking.
+ * good for SPARSEMEM_EXTREME
+ * To use this, you may need arch support in page fault handler.
+ */
+static inline int pfn_valid(unsigned long pfn)
+{
+ struct page *pg = pfn_to_page(pfn);
+ return (VIRTUAL_MEM_MAP <= pg &&
+ pg < (VIRUTAL_MEM_MAP + VIRTUAL_MEM_MAP_SIZE) &&
+ check_valid_memmap(pg));
+}
+#else
static inline int pfn_valid(unsigned long pfn)
{
if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
return 0;
return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
}
+#endif
/*
* These are _only_ used during initialisation, therefore they
Index: devel-2.6.19-rc6-mm2/mm/sparse.c
===================================================================
--- devel-2.6.19-rc6-mm2.orig/mm/sparse.c 2006-12-05 21:25:43.000000000 +0900
+++ devel-2.6.19-rc6-mm2/mm/sparse.c 2006-12-05 21:47:21.000000000 +0900
@@ -103,6 +103,22 @@
#ifdef CONFIG_SPARSEMEM_VMEMMAP
+#ifdef CONFIG_USE_OPT_PFN_VALID
+
+/* check mem_map is valid or not by accessing it.
+ Because virtual mem_map/sparse mem is always alined, just __get_user()
+ check is necessary.
+ */
+int check_valid_memmap(struct page *pg)
+{
+ char byte;
+ if (__get_user(byte, (char __user*)pg) == 0)
+ return 1;
+ return 0;
+}
+
+EXPORT_SYMBOL(check_valid_memmap);
+#endif
static void* __meminit pte_alloc_vmem_map(int node)
{
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* [RFC][PATCH] vmemmap on sparsemem v2 [5/5] optimzied pfn_valid support for ia64
2006-12-05 12:45 [RFC][PATCH] vmemmap on sparsemem v2 KAMEZAWA Hiroyuki
` (3 preceding siblings ...)
2006-12-05 13:09 ` [RFC][PATCH] vmemmap on sparsemem v2 [4/5] optimized pfn_valid KAMEZAWA Hiroyuki
@ 2006-12-05 13:10 ` KAMEZAWA Hiroyuki
2006-12-10 13:37 ` [RFC][PATCH] vmemmap on sparsemem v2 Andy Whitcroft
5 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-12-05 13:10 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, clameter, apw
USE_OPT_PFN_VALID support for ia64.
Because ia64 already has its own VIRTUAL_MEM_MAP handling,
This patch is simple.
When porting other archs, you have to add hook in page fault handler
and write a func like mapped_kernel_page_is_present() (ia64).
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
arch/ia64/Kconfig | 4 ++++
arch/ia64/mm/fault.c | 4 ++--
2 files changed, 6 insertions(+), 2 deletions(-)
Index: devel-2.6.19-rc6-mm2/arch/ia64/Kconfig
===================================================================
--- devel-2.6.19-rc6-mm2.orig/arch/ia64/Kconfig 2006-12-05 20:41:55.000000000 +0900
+++ devel-2.6.19-rc6-mm2/arch/ia64/Kconfig 2006-12-05 20:45:34.000000000 +0900
@@ -349,6 +349,10 @@
def_bool y
depends on ARCH_SPARSEMEM_ENABLE
+config USE_OPT_PFN_VALID
+ def_bool y
+ depends on SPARSEMEM_VMEMMAP
+
config ARCH_DISCONTIGMEM_DEFAULT
def_bool y if (IA64_SGI_SN2 || IA64_GENERIC || IA64_HP_ZX1 || IA64_HP_ZX1_SWIOTLB)
depends on ARCH_DISCONTIGMEM_ENABLE
Index: devel-2.6.19-rc6-mm2/arch/ia64/mm/fault.c
===================================================================
--- devel-2.6.19-rc6-mm2.orig/arch/ia64/mm/fault.c 2006-12-05 20:41:55.000000000 +0900
+++ devel-2.6.19-rc6-mm2/arch/ia64/mm/fault.c 2006-12-05 20:45:34.000000000 +0900
@@ -103,7 +103,7 @@
if (in_atomic() || !mm)
goto no_context;
-#ifdef CONFIG_VIRTUAL_MEM_MAP
+#if defined(CONFIG_VIRTUAL_MEM_MAP) || defined(CONFIG_USE_OPT_PFN_VALID)
/*
* If fault is in region 5 and we are in the kernel, we may already
* have the mmap_sem (pfn_valid macro is called during mmap). There
@@ -211,7 +211,7 @@
bad_area:
up_read(&mm->mmap_sem);
-#ifdef CONFIG_VIRTUAL_MEM_MAP
+#if defined(CONFIG_VIRTUAL_MEM_MAP) || defined(CONFIG_SPARSEMEM_VMEMMAP)
bad_area_no_up:
#endif
if ((isr & IA64_ISR_SP)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [RFC][PATCH] vmemmap on sparsemem v2
2006-12-05 12:45 [RFC][PATCH] vmemmap on sparsemem v2 KAMEZAWA Hiroyuki
` (4 preceding siblings ...)
2006-12-05 13:10 ` [RFC][PATCH] vmemmap on sparsemem v2 [5/5] optimzied pfn_valid support for ia64 KAMEZAWA Hiroyuki
@ 2006-12-10 13:37 ` Andy Whitcroft
2006-12-10 15:19 ` Heiko Carstens
5 siblings, 1 reply; 20+ messages in thread
From: Andy Whitcroft @ 2006-12-10 13:37 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: Linux-MM, Christoph Lameter, heiko.carstens
KAMEZAWA Hiroyuki wrote:
> Hi, this is patches for the virtual mem_map on sparsemem.
>
> The virtual mem_map will reduce costs of page_to_pfn/pfn_to_page of
> SPARSEMEM_EXTREME.
>
> I post this series in October but haven't been able to update.
> I rewrote the whole patches and reflected comments from Christoph-san and Andy-san.
> tested on ia64/tiger4.
>
> Changes v1 -> v2:
> - support memory hotplug case.
> - uses static address for vmem_map (ia64)
> - added optimized pfn_valid() for ia64 (experimental)
>
> consists of 5 patches:
> 1.. generic vmemmap_sparsemem
> 2.. memory hotplug support
> 3.. ia64 vmemmap_sparsemem definitions
> 4.. optimized pfn_valid (experimental)
> 5.. changes for pfn_valid (experimental)
>
> I don't manage large-page-size vmem_map in this series to keep patches simple.
> maybe I need more study to implement it in clean way.
>
> This patch is against 2.6.19-rc6-mm2, and I'll rebase this to the next -mm
> (possibly). So this patch is just for RFC.
>
> Any comments are welcome.
> -Kame
Sorry, I started reviewing v2 and out comes v3 :).
I have to say that I have generally been a virtual memap sceptic. It
seems complex and any testing I have done or seen doesn't seem to show
any noticible performance benefit.
That said I do like the general thrust of this patch set. There is
basically no architecture specific component for this implementation
other than specifying the base address. This seems worth of testing
(and I see akpm has already slurped this up) good.
Would we expect to see this replace the existing ia64 implementation in
the long term? I'd hate to see us having competing implementations
here. Also Heiko would this framework with your s390 requirements for
vmem_map, I know that you have a particularly challenging physical
layout? It would be great to see just one of these in the kernel.
-apw
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [RFC][PATCH] vmemmap on sparsemem v2
2006-12-10 13:37 ` [RFC][PATCH] vmemmap on sparsemem v2 Andy Whitcroft
@ 2006-12-10 15:19 ` Heiko Carstens
2006-12-11 1:09 ` KAMEZAWA Hiroyuki
2006-12-11 17:23 ` Christoph Lameter
0 siblings, 2 replies; 20+ messages in thread
From: Heiko Carstens @ 2006-12-10 15:19 UTC (permalink / raw)
To: Andy Whitcroft
Cc: KAMEZAWA Hiroyuki, Linux-MM, Christoph Lameter, Martin Schwidefsky
> I have to say that I have generally been a virtual memap sceptic. It
> seems complex and any testing I have done or seen doesn't seem to show
> any noticible performance benefit.
>
> That said I do like the general thrust of this patch set. There is
> basically no architecture specific component for this implementation
> other than specifying the base address. This seems worth of testing
> (and I see akpm has already slurped this up) good.
>
> Would we expect to see this replace the existing ia64 implementation in
> the long term? I'd hate to see us having competing implementations
> here. Also Heiko would this framework with your s390 requirements for
> vmem_map, I know that you have a particularly challenging physical
> layout? It would be great to see just one of these in the kernel.
Hmm.. this implementation still requires sparsemem. Maybe it would be
possible to implement a generic vmem_map infrastructure that works with
and without sparsemem?
I would be more than happy to get rid of the s390 specific vmem_map
implementation (it is merged in the meantime).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC][PATCH] vmemmap on sparsemem v2
2006-12-10 15:19 ` Heiko Carstens
@ 2006-12-11 1:09 ` KAMEZAWA Hiroyuki
2006-12-11 17:23 ` Christoph Lameter
1 sibling, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2006-12-11 1:09 UTC (permalink / raw)
To: Heiko Carstens; +Cc: apw, linux-mm, clameter, schwidefsky
On Sun, 10 Dec 2006 16:19:31 +0100
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > Would we expect to see this replace the existing ia64 implementation in
> > the long term? I'd hate to see us having competing implementations
> > here. Also Heiko would this framework with your s390 requirements for
> > vmem_map, I know that you have a particularly challenging physical
> > layout? It would be great to see just one of these in the kernel.
>
> Hmm.. this implementation still requires sparsemem. Maybe it would be
> possible to implement a generic vmem_map infrastructure that works with
> and without sparsemem?
Maybe we need
(1) stop making use of PAGE_SIZE alignment of sprasemem's mem_map
(2) implement pfn_valid().
(3) add generic style call for creating mem_map from the list of pfn range
and vmem_map alignment concept.
other ?
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [RFC][PATCH] vmemmap on sparsemem v2
2006-12-10 15:19 ` Heiko Carstens
2006-12-11 1:09 ` KAMEZAWA Hiroyuki
@ 2006-12-11 17:23 ` Christoph Lameter
1 sibling, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2006-12-11 17:23 UTC (permalink / raw)
To: Heiko Carstens
Cc: Andy Whitcroft, KAMEZAWA Hiroyuki, Linux-MM, Martin Schwidefsky
On Sun, 10 Dec 2006, Heiko Carstens wrote:
> Hmm.. this implementation still requires sparsemem. Maybe it would be
> possible to implement a generic vmem_map infrastructure that works with
> and without sparsemem?
What is the additional sparsemem overhead still around with this patchset?
I thought the sparsemem tables were replaced by the page tables?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread