linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* RE: [RFC/PATCH]  pfn_valid() more generic : arch independent part[0/2]
@ 2004-10-07  5:22 Luck, Tony
  2004-10-07  6:28 ` Hiroyuki KAMEZAWA
  0 siblings, 1 reply; 10+ messages in thread
From: Luck, Tony @ 2004-10-07  5:22 UTC (permalink / raw)
  To: Hiroyuki KAMEZAWA, Martin J. Bligh; +Cc: LinuxIA64, linux-mm

>Because pfn_valid() often returns 0 in inner loop of free_pages_bulk(),
>I want to avoid page fault caused by using get_user() in pfn_valid().

How often?  Surely this is only a problem at the edges of blocks
of memory?  I suppose it depends on whether your discontig memory
appears in blocks much smaller than MAXORDER.  But even there it
should only be an issue coalescing buddies that are bigger than
the granule size (since all of the pages in a granule on ia64 are
guaranteed to exist, the buddy of any page must also exist).

Do you have some data to show that this is a problem.

-Tony
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC/PATCH]  pfn_valid() more generic : arch independent part[0/2]
  2004-10-07  5:22 [RFC/PATCH] pfn_valid() more generic : arch independent part[0/2] Luck, Tony
@ 2004-10-07  6:28 ` Hiroyuki KAMEZAWA
  2004-10-07  6:51   ` align vmemmap to ia64's granule Hiroyuki KAMEZAWA
  2004-10-07 14:38   ` [RFC/PATCH] pfn_valid() more generic : arch independent part[0/2] Martin J. Bligh
  0 siblings, 2 replies; 10+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-10-07  6:28 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Martin J. Bligh, LinuxIA64, linux-mm

Hi,
Luck, Tony wrote:

>>Because pfn_valid() often returns 0 in inner loop of free_pages_bulk(),
>>I want to avoid page fault caused by using get_user() in pfn_valid().
> 
> 
> How often?  Surely this is only a problem at the edges of blocks
> of memory?  I suppose it depends on whether your discontig memory
> appears in blocks much smaller than MAXORDER.  But even there it
> should only be an issue coalescing buddies that are bigger than
> the granule size (since all of the pages in a granule on ia64 are
> guaranteed to exist, the buddy of any page must also exist).
> 
Currently, my Tiger4 shows  memory map like this.
this is a record of memmap_init() called by virtual_memmap_init().
NOTE: MAX_ORDER is 4Gbytes.

mem_map(1) from  36e    length 1fb6d  --- ZONE_DMA    (36e to 1fedb)
mem_map(2) from  1fedc  length   124  --- ZONE_DMA    (1fedc to 20000)
ZONE_DMA is 0G to 4G.
mem_map(3) from  40000  length 40000  --- ZONE_NORMAL (4G to 8G, this mem_map is aligned)
mem_map(4) from  a0000  length 20000  --- ZONE_NORMAL (10G to 12G)
mem_map(5) from  bfedc  length   124  --- ZONE_NORMAL (this is involved in mem_map(4))
ZONE_NORMAL is 4G to 12G.

node's start_pfn and end_pfn is aligned to granule size, but holes in memmap is not.
The vmemmap is aligned to # of page structs in one page.

virtual_memmap_init() is called directly from efi_memmap_walk() and
it doesn't take granule size of ia64 into account.

Hmm....
It looks what I should do is to make memmap to be aligned to ia64's granule.
thanks for your advise. I maybe considerd this problem too serious.

If vmemmap is aligned, ia64_pfn_valid() will work fine. or only 1 level table
will be needed.

Thanks.

Kame <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: align vmemmap to ia64's granule
  2004-10-07  6:28 ` Hiroyuki KAMEZAWA
@ 2004-10-07  6:51   ` Hiroyuki KAMEZAWA
  2004-10-07 14:38   ` [RFC/PATCH] pfn_valid() more generic : arch independent part[0/2] Martin J. Bligh
  1 sibling, 0 replies; 10+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-10-07  6:51 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Hiroyuki KAMEZAWA, Martin J. Bligh, LinuxIA64, linux-mm

Hi, Tony

This patch make vmemmap to be aligned with ia64's granule size, against 2.6.9-rc3.
Please apply this, if vmemmap is expected to be aligned with the granule size.

Kame <kamezawa.hiroyu@jp.fujitsu.com>

Hiroyuki KAMEZAWA wrote:
> Hi,
> Luck, Tony wrote:
> 
>>> Because pfn_valid() often returns 0 in inner loop of free_pages_bulk(),
>>> I want to avoid page fault caused by using get_user() in pfn_valid().
>>
>>
>>
>> How often?  Surely this is only a problem at the edges of blocks
>> of memory?  I suppose it depends on whether your discontig memory
>> appears in blocks much smaller than MAXORDER.  But even there it
>> should only be an issue coalescing buddies that are bigger than
>> the granule size (since all of the pages in a granule on ia64 are
>> guaranteed to exist, the buddy of any page must also exist).

> node's start_pfn and end_pfn is aligned to granule size, but holes in 
> memmap is not.
> The vmemmap is aligned to # of page structs in one page.


---

  test-kernel-kamezawa/arch/ia64/mm/init.c |    2 ++
  1 files changed, 2 insertions(+)

diff -puN arch/ia64/mm/init.c~vmemmap_align_granule arch/ia64/mm/init.c
--- test-kernel/arch/ia64/mm/init.c~vmemmap_align_granule	2004-10-07 15:24:08.322733968 +0900
+++ test-kernel-kamezawa/arch/ia64/mm/init.c	2004-10-07 15:30:58.623358792 +0900
@@ -411,6 +411,8 @@ virtual_memmap_init (u64 start, u64 end,

  	args = (struct memmap_init_callback_data *) arg;

+	start = GRANULEROUNDDOWN(start);
+	end  = GRANULEROUNDUP(end);
  	map_start = vmem_map + (__pa(start) >> PAGE_SHIFT);
  	map_end   = vmem_map + (__pa(end) >> PAGE_SHIFT);


_


-- 
--the clue is these footmarks leading to the door.--
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC/PATCH]  pfn_valid() more generic : arch independent part[0/2]
  2004-10-07  6:28 ` Hiroyuki KAMEZAWA
  2004-10-07  6:51   ` align vmemmap to ia64's granule Hiroyuki KAMEZAWA
@ 2004-10-07 14:38   ` Martin J. Bligh
  2004-10-07 23:38     ` Hiroyuki KAMEZAWA
  1 sibling, 1 reply; 10+ messages in thread
From: Martin J. Bligh @ 2004-10-07 14:38 UTC (permalink / raw)
  To: Hiroyuki KAMEZAWA, Luck, Tony; +Cc: LinuxIA64, linux-mm

> mem_map(1) from  36e    length 1fb6d  --- ZONE_DMA    (36e to 1fedb)
> mem_map(2) from  1fedc  length   124  --- ZONE_DMA    (1fedc to 20000)
> ZONE_DMA is 0G to 4G.
> mem_map(3) from  40000  length 40000  --- ZONE_NORMAL (4G to 8G, this mem_map is aligned)
> mem_map(4) from  a0000  length 20000  --- ZONE_NORMAL (10G to 12G)
> mem_map(5) from  bfedc  length   124  --- ZONE_NORMAL (this is involved in mem_map(4))
> ZONE_NORMAL is 4G to 12G.
> 
> node's start_pfn and end_pfn is aligned to granule size, but holes in memmap is not.
> The vmemmap is aligned to # of page structs in one page.
> 
> virtual_memmap_init() is called directly from efi_memmap_walk() and
> it doesn't take granule size of ia64 into account.
> 
> Hmm....
> It looks what I should do is to make memmap to be aligned to ia64's granule.
> thanks for your advise. I maybe considerd this problem too serious.
> 
> If vmemmap is aligned, ia64_pfn_valid() will work fine. or only 1 level table
> will be needed.

The normal way to fix the above is just to have a bitmap array to test - 
in your case a 1GB granularity would be sufficicent. That takes < 1 word
to implement for the example above ;-)

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC/PATCH]  pfn_valid() more generic : arch independent part[0/2]
  2004-10-07 14:38   ` [RFC/PATCH] pfn_valid() more generic : arch independent part[0/2] Martin J. Bligh
@ 2004-10-07 23:38     ` Hiroyuki KAMEZAWA
  0 siblings, 0 replies; 10+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-10-07 23:38 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Luck, Tony, LinuxIA64, linux-mm

Martin J. Bligh wrote:
> The normal way to fix the above is just to have a bitmap array to test - 
> in your case a 1GB granularity would be sufficicent. That takes < 1 word
> to implement for the example above ;-)
> 
> M.
> 

Although I don't like a page fault, I now understand it doesn't often happen.
I'd like to use current implementation.

Thanks

Kame <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC/PATCH]  pfn_valid() more generic : arch independent part[0/2]
  2004-10-07 15:53 Luck, Tony
@ 2004-10-07 16:02 ` Martin J. Bligh
  0 siblings, 0 replies; 10+ messages in thread
From: Martin J. Bligh @ 2004-10-07 16:02 UTC (permalink / raw)
  To: Luck, Tony, Hiroyuki KAMEZAWA; +Cc: LinuxIA64, linux-mm

--"Luck, Tony" <tony.luck@intel.com> wrote (on Thursday, October 07, 2004 08:53:32 -0700):

>> The normal way to fix the above is just to have a bitmap array 
>> to test - in your case a 1GB granularity would be sufficicent. That 
>> takes < 1 word to implement for the example above ;-)
> 
> In the general case you need a bit for each granule (since that is the
> unit that the kernel admits/denies the existence of memory).  But the
> really sparse systems end up with a large bitmap.  SGI Altix uses 49
> physical address bits, and a granule size of 16MB ... so we need 2^25
> bits ... i.e. 4MBbytes.  While that's a drop in the ocean on a 4TB
> machine, it still seems a pointless waste.

If it's that sparse, it might be worth having another data structure,
perhaps a tree, or some form of hierarchical bitmap. But probably the
most important thing is to do it in one cacheline read, so personally
I'd stick with the array. Whatever you chose, I still don't understand 
where all that code came from ;-)

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC/PATCH]  pfn_valid() more generic : arch independent part[0/2]
@ 2004-10-07 15:53 Luck, Tony
  2004-10-07 16:02 ` Martin J. Bligh
  0 siblings, 1 reply; 10+ messages in thread
From: Luck, Tony @ 2004-10-07 15:53 UTC (permalink / raw)
  To: Martin J. Bligh, Hiroyuki KAMEZAWA; +Cc: LinuxIA64, linux-mm

>The normal way to fix the above is just to have a bitmap array 
>to test - in your case a 1GB granularity would be sufficicent. That 
>takes < 1 word to implement for the example above ;-)

In the general case you need a bit for each granule (since that is the
unit that the kernel admits/denies the existence of memory).  But the
really sparse systems end up with a large bitmap.  SGI Altix uses 49
physical address bits, and a granule size of 16MB ... so we need 2^25
bits ... i.e. 4MBbytes.  While that's a drop in the ocean on a 4TB
machine, it still seems a pointless waste.

-Tony
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC/PATCH]  pfn_valid() more generic : arch independent part[0/2]
  2004-10-06 15:14 ` Martin J. Bligh
@ 2004-10-07  0:10   ` Hiroyuki KAMEZAWA
  0 siblings, 0 replies; 10+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-10-07  0:10 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: LinuxIA64, linux-mm

Martin J. Bligh wrote:

>>This is generic parts.
>>
>>Boot-time routine:
>>At first, information of valid pages is gathered into a list.
>>After gathering all information, 2 level table are created.
>>Why I create table instead of using a list is only for good cache hit.
>>
>>pfn_valid_init()  <- initilize some structures
>>validate_pages(start,size) <- gather valid pfn information
>>pfn_valid_setup() <- create 1st and 2nd table.
> 
> 
> 
> Boggle. what on earth are you trying to do?
> 
I just want to test whether a struct page for that pfn exists or not.
ia64 has holes in memmap in a zone, so ia64_pfn_valid() uses get_user() to test
whether a page struct exists or not.

In my no-bitmap buddy allocator, I must call pfn_valid() for ia64 at every loop
in free_pages_bulk()(in mm/page_alloc.c).
Beacause of holes in memmap, bad_range()(in mm/page_alloc.c) cannot work enough.

code will be like this:
while(...) {
      pfn_of_buddy = some_func(pfn);
      if( bad_range(pfn_of_buddy) )
	       break;
      if( pfn_valid(pfn_of_buddy) )   <----- only for ia64.
                                             this will disappear in other archs.
	       break;
      ....
}

Because pfn_valid() often returns 0 in inner loop of free_pages_bulk(),
I want to avoid page fault caused by using get_user() in pfn_valid().

I have 2 plan (1) modify pfn_valid or (2) modify bad_range().
this is plan(1).

In plan(2), 1st/2nd tables are attached to each zone/pgdat.


> pfn_valid does exactly one thing - it checks whether there is a struct
> page for that pfn. Nothing else. Surely that can't possibly take a tenth
> of this amount of code?
> 
> M.

Kame <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC/PATCH]  pfn_valid() more generic : arch independent part[0/2]
  2004-10-06  6:37 Hiroyuki KAMEZAWA
@ 2004-10-06 15:14 ` Martin J. Bligh
  2004-10-07  0:10   ` Hiroyuki KAMEZAWA
  0 siblings, 1 reply; 10+ messages in thread
From: Martin J. Bligh @ 2004-10-06 15:14 UTC (permalink / raw)
  To: Hiroyuki KAMEZAWA, LinuxIA64; +Cc: linux-mm

> This is generic parts.
> 
> Boot-time routine:
> At first, information of valid pages is gathered into a list.
> After gathering all information, 2 level table are created.
> Why I create table instead of using a list is only for good cache hit.
> 
> pfn_valid_init()  <- initilize some structures
> validate_pages(start,size) <- gather valid pfn information
> pfn_valid_setup() <- create 1st and 2nd table.


Boggle. what on earth are you trying to do?

pfn_valid does exactly one thing - it checks whether there is a struct
page for that pfn. Nothing else. Surely that can't possibly take a tenth
of this amount of code?

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC/PATCH]  pfn_valid() more generic : arch independent part[0/2]
@ 2004-10-06  6:37 Hiroyuki KAMEZAWA
  2004-10-06 15:14 ` Martin J. Bligh
  0 siblings, 1 reply; 10+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-10-06  6:37 UTC (permalink / raw)
  To: LinuxIA64; +Cc: linux-mm

This is generic parts.

Boot-time routine:
At first, information of valid pages is gathered into a list.
After gathering all information, 2 level table are created.
Why I create table instead of using a list is only for good cache hit.

pfn_valid_init()  <- initilize some structures
validate_pages(start,size) <- gather valid pfn information
pfn_valid_setup() <- create 1st and 2nd table.



Kame <kamezawa.hiroyu@jp.fujitsu.com>


---

 test-pfn-valid-kamezawa/include/linux/mm.h        |    2
 test-pfn-valid-kamezawa/include/linux/pfn_valid.h |   51 +++++
 test-pfn-valid-kamezawa/mm/page_alloc.c           |  191 ++++++++++++++++++++++
 3 files changed, 244 insertions(+)

diff -puN /dev/null include/linux/pfn_valid.h
--- /dev/null	2004-06-25 03:05:40.000000000 +0900
+++ test-pfn-valid-kamezawa/include/linux/pfn_valid.h	2004-10-05 12:03:54.000000000 +0900
@@ -0,0 +1,51 @@
+#ifndef _LINUX_PFN_VALID_H
+#define _LINUX_PFN_VALID_H
+/*
+ * Implementing pfn_valid() for managing memory hole.
+ * this uses 2 level table.
+ * 1st table is accessed by index of (pfn >> PFN_VALID_MAPSHIFT).
+ * It has rough information and pointer to 2nd table.
+ * If rough information is enough, 2nd table is not accessed.
+ * 2nd table has (start_pfn, nr_pages) entry which are sorted by start_pfn.
+ */
+
+#ifdef CAREFUL_PFN_VALID
+/* for 2nd level */
+struct pfn_valid_info {
+	unsigned long start_pfn;
+	unsigned long end_pfn;   /* Caution:end_pfn is not included in
+                                    this valid pfn range */
+};
+/* for 1st level */
+typedef union {
+	unsigned short valid;    /* for fast checking. take 2 values */
+	unsigned short index;    /* index to table,start for searching */
+}pfn_validmap_t;
+
+#define PFN_ALL_INVALID 0xffff
+#define PFN_ALL_VALID   0xfffe
+#define pfn_all_valid(ent)   ((ent)->valid == PFN_ALL_VALID)
+#define pfn_all_invalid(ent) ((ent)->valid == PFN_ALL_INVALID)
+
+#ifndef PFN_VALID_MAPSHIFT
+#define PFN_VALID_MAPSHIFT 16
+#endif
+
+#define PFN_VALID_MAPSIZE   (1 << PFN_VALID_MAPSHIFT)
+#define PFN_VALID_MAPMASK   (~(PFN_VALID_MAPSIZE - 1))
+
+extern void __init validate_pages(unsigned long start_pfn,
+				  unsigned long nr_pages);
+extern void __init pfn_valid_init(void);
+extern void __init pfn_valid_setup(void);
+extern int careful_pfn_valid(unsigned long pfn);
+
+#else /* CAREFUL_PFN_VALID */
+
+#define pfn_valid_init() do {}while(0)
+#define validate_pages(a, b) do {}while(0)
+#define pfn_validmap_setup() do{} while(0)
+
+
+#endif /* CAREFUL_PFN_VALID */
+#endif
diff -puN mm/page_alloc.c~careful_pfn_valid mm/page_alloc.c
--- test-pfn-valid/mm/page_alloc.c~careful_pfn_valid	2004-10-05 12:03:54.000000000 +0900
+++ test-pfn-valid-kamezawa/mm/page_alloc.c	2004-10-05 15:22:16.000000000 +0900
@@ -1399,6 +1399,8 @@ void __init memmap_init_zone(unsigned lo
 	struct page *start = pfn_to_page(start_pfn);
 	struct page *page;

+	validate_pages(start_pfn, size);
+
 	for (page = start; page < (start + size); page++) {
 		set_page_zone(page, NODEZONE(nid, zone));
 		set_page_count(page, 0);
@@ -2069,3 +2071,192 @@ void *__init alloc_large_system_hash(con

 	return table;
 }
+
+
+#ifdef CAREFUL_PFN_VALID
+/*
+ * this structure is not used when system is runnning.this used only for
+ * setup table
+ */
+struct pfn_valid_info_list {
+	struct list_head list;
+ 	struct pfn_valid_info info;
+};
+
+int num_pfn_valid_info;
+unsigned long max_valid_pfn;
+pfn_validmap_t *pfn_validmap;
+struct pfn_valid_info *pfn_valid_info_table;
+struct list_head __initdata pfn_valid_info_head;
+struct list_head __initdata pfn_valid_info_free;
+struct pfn_valid_info_list pfn_valid_info_list_pool[8 * MAX_NUMNODES] __initdata;
+
+/*
+ * initialize all structures and allocate pfn_valid_info_list.
+ * pfn_valid_info_lists are freed when we finish initializaiton.
+ */
+void __init pfn_valid_init()
+{
+	struct pfn_valid_info_list *info;
+	int i, num;
+	INIT_LIST_HEAD(&pfn_valid_info_head);
+	INIT_LIST_HEAD(&pfn_valid_info_free);
+	/* this memory is used only in boot-time */
+	info = pfn_valid_info_list_pool;
+	num = 8 * MAX_NUMNODES;
+	for (i = 0;i < num; i++, info++) {
+		list_add(&info->list, &pfn_valid_info_free);
+	}
+	num_pfn_valid_info = 0;
+	max_valid_pfn = 0;
+	pfn_validmap = NULL;
+	pfn_valid_info_table = NULL;
+}
+
+static struct pfn_valid_info_list * __init alloc_pfn_valid_info_list(
+				        unsigned long start_pfn,
+					unsigned long nr_pages)
+{
+	struct pfn_valid_info_list *ret;
+	struct list_head *top;
+	if (list_empty(&pfn_valid_info_free)) {
+		printk("pfn valid info are exhausted. too much small mems?");
+		BUG();
+	}
+ 	top = pfn_valid_info_free.next;
+	list_del(top);
+	ret = list_entry(top, struct pfn_valid_info_list, list);
+	ret->info.start_pfn = start_pfn;
+	ret->info.end_pfn = start_pfn + nr_pages;
+	num_pfn_valid_info++;
+	return ret;
+}
+
+static void __init free_pfn_valid_info_list(struct pfn_valid_info_list *ent)
+{
+	list_add(&ent->list, &pfn_valid_info_free);
+	num_pfn_valid_info--;
+}
+
+void __init validate_pages(unsigned long start_pfn,
+			   unsigned long nr_pages)
+{
+	struct pfn_valid_info_list *new, *ent, *next;
+	struct list_head *pos;
+	/* add entries */
+	new = alloc_pfn_valid_info_list(start_pfn, nr_pages);
+	list_for_each_entry(ent, &pfn_valid_info_head, list) {
+		if (ent->info.start_pfn >= new->info.start_pfn)
+			break;
+	}
+	list_add_tail(&new->list, &ent->list);
+	/* we must find and coalesce overlapped entries */
+	pos = pfn_valid_info_head.next;
+	while (pos != &pfn_valid_info_head) {
+		if (pos->next == &pfn_valid_info_head)
+			break;
+		ent = list_entry(pos, struct pfn_valid_info_list,list);
+		next = list_entry(pos->next, struct pfn_valid_info_list, list);
+		if ((ent->info.start_pfn <= next->info.start_pfn) &&
+		    (ent->info.end_pfn >= next->info.start_pfn)) {
+			ent->info.end_pfn =
+				(ent->info.end_pfn > next->info.end_pfn)?
+				ent->info.end_pfn : next->info.end_pfn;
+			list_del(pos->next);
+			free_pfn_valid_info_list(next);
+		} else {
+			pos = pos->next;
+		}
+	}
+	if (start_pfn + nr_pages > max_valid_pfn)
+		max_valid_pfn = start_pfn + nr_pages;
+	return;
+}
+
+/*
+ * before calling pfn_valid_map_setup(), we onlu have a list of valid pfn.
+ * we create a table of valid pfn and a map. The map works as a hash table
+ * and enables direct access to valid pfn. We call the map as level-1 table,
+ * the table of valid pfn as level-2 table.
+ * Note: after initilization, the list of valid pfn will be discarded.
+ */
+
+void __init pfn_valid_setup(void)
+{
+	struct pfn_valid_info *info;
+	struct pfn_valid_info_list *lent;
+	unsigned long pfn, end, index, offset;
+	int tablesize, mapsize;
+	/* create 2nd level table from list */
+	/* allocate space for table */
+	tablesize = sizeof(struct pfn_valid_info) * (num_pfn_valid_info + 1);
+	tablesize = LONG_ALIGN(tablesize);
+	pfn_valid_info_table = alloc_bootmem(tablesize);
+ 	memset(pfn_valid_info_table, 0, tablesize);
+	/* fill entries */
+	info = pfn_valid_info_table;
+        list_for_each_entry(lent, &pfn_valid_info_head, list) {
+		info->start_pfn = lent->info.start_pfn;
+		info->end_pfn = lent->info.end_pfn;
+		info++;
+	}
+	info->start_pfn = ~(0UL);
+	info->end_pfn = 0;
+
+	/* init level 1 table */
+	mapsize = sizeof(pfn_validmap_t) *
+		((max_valid_pfn >> PFN_VALID_MAPSHIFT) + 1);
+	mapsize = LONG_ALIGN(mapsize);
+	pfn_validmap = alloc_bootmem(mapsize);
+	memset(pfn_validmap, 0, mapsize);
+
+	/* fill level 1 table */
+	for (pfn = 0; pfn < max_valid_pfn; pfn += PFN_VALID_MAPSIZE) {
+		end = pfn + PFN_VALID_MAPSIZE - 1;
+		for (info = pfn_valid_info_table, offset=0;
+		     info->end_pfn != 0;
+		     info++, offset++) {
+			if (((info->start_pfn <= pfn) &&
+			     (info->end_pfn > pfn)) ||
+			     ((info->start_pfn > pfn) &&
+			      (info->start_pfn < end)) )
+				break;
+		}
+		index = pfn >> PFN_VALID_MAPSHIFT;
+		if (info->end_pfn != 0) {
+			if ((info->start_pfn <= pfn) && (info->end_pfn > end))
+				pfn_validmap[index].valid = PFN_ALL_VALID;
+			else
+				pfn_validmap[index].index = offset;
+		} else {
+			pfn_validmap[index].valid = PFN_ALL_INVALID;
+		}
+	}
+	return;
+}
+
+int careful_pfn_valid(unsigned long pfn)
+{
+	int index;
+	pfn_validmap_t *map;
+	struct pfn_valid_info *info;
+	if (pfn >= max_valid_pfn)
+		return 0;
+	index = pfn >> PFN_VALID_MAPSHIFT;
+	map = &pfn_validmap[index];
+	if (pfn_all_valid(map))
+		return 1;
+	if (pfn_all_invalid(map))
+		return 0;
+	/* go to 2nd level */
+	info = pfn_valid_info_table + map->index;
+	/* table is sorted */
+	while (info->start_pfn <= pfn) {
+		if ((info->start_pfn <= pfn) && (info->end_pfn > pfn))
+			return 1;
+		info++;
+	}
+	return 0;
+}
+EXPORT_SYMBOL(careful_pfn_valid);
+#endif /* CAREFUL_PFN_VALID */
diff -puN include/linux/mm.h~careful_pfn_valid include/linux/mm.h
--- test-pfn-valid/include/linux/mm.h~careful_pfn_valid	2004-10-05 12:03:54.000000000 +0900
+++ test-pfn-valid-kamezawa/include/linux/mm.h	2004-10-05 12:03:54.000000000 +0900
@@ -41,6 +41,8 @@ extern int sysctl_legacy_va_layout;
 #define MM_VM_SIZE(mm)	TASK_SIZE
 #endif

+#include <linux/pfn_valid.h>
+
 /*
  * Linux kernel virtual memory manager primitives.
  * The idea being to have a "virtual" mm in the same way

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-10-07 23:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-07  5:22 [RFC/PATCH] pfn_valid() more generic : arch independent part[0/2] Luck, Tony
2004-10-07  6:28 ` Hiroyuki KAMEZAWA
2004-10-07  6:51   ` align vmemmap to ia64's granule Hiroyuki KAMEZAWA
2004-10-07 14:38   ` [RFC/PATCH] pfn_valid() more generic : arch independent part[0/2] Martin J. Bligh
2004-10-07 23:38     ` Hiroyuki KAMEZAWA
  -- strict thread matches above, loose matches on Subject: below --
2004-10-07 15:53 Luck, Tony
2004-10-07 16:02 ` Martin J. Bligh
2004-10-06  6:37 Hiroyuki KAMEZAWA
2004-10-06 15:14 ` Martin J. Bligh
2004-10-07  0:10   ` Hiroyuki KAMEZAWA

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox