[RFC/PATCH] pfn_valid() more generic : intro[0/2]

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC/PATCH]  pfn_valid() more generic : intro[0/2]
@ 2004-10-06  6:20 Hiroyuki KAMEZAWA
  0 siblings, 0 replies; 3+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-10-06  6:20 UTC (permalink / raw)
  To: LinuxIA64; +Cc: linux-mm

Hi,

ia64's ia64_pfn_valid() uses get_user() for checking whether a page struct
is available or not. I think this is an irregular implementation and following patches
are a more generic replacement, careful_pfn_valid(). It uses 2 level table.

Core Algorithm
====
1st level, pfn_validmap[] has index to 2nd level table.
2nd level table is consists of (start, end) entries of valid pfns.

careful_pfn_valid(pfn)
  -> pfn_validmap[(pfn >> PFN_VALID_MAPSHIFT)] == entry
     if (entry ==  ALL_VALID) return 1
     if (entry ==  ALL_INVALID)  return 0

      -> check 2nd level,
     info = pfn_valid_info_table + entry.
     while(info->start_pfn < pfn) {
          if((info->start_pfn <= pfn) && (info->end_pfn > pfn))
                     return 0;
               info++;
     }
     return 1;
====
sizeof(entry) is 2 bytes and each entry covers 1GB with current config(16k pages).

Here is kernbench results on my Tiger4 (Itanium2(1.3GHz) x2, 8 Gbytes memory),pagesize=16k

Average Optimal -j8 Load Run:
                         Elapsed Time  User Time  System Time  Percent CPU  C/Switch   Sleeps
2.6.9-rc3                 699.906       1322.01     39.336        194        64390    74416.8
2.6.9-rc3 + this_patch    698.478       1321.76     38.228        194        64502    74185

there are no difference :)

For NUMA, I think tables for careful_pfn_valid() should be copied to each node's local memory,
but I haven't implemented it yet.

-- Kame <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC/PATCH]  pfn_valid() more generic : intro[0/2]
  2004-10-06  6:33 Luck, Tony
@ 2004-10-06  7:33 ` Hiroyuki KAMEZAWA
  0 siblings, 0 replies; 3+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-10-06  7:33 UTC (permalink / raw)
  To: Luck, Tony; +Cc: LinuxIA64, linux-mm

Hi,

Luck, Tony wrote:
>>ia64's ia64_pfn_valid() uses get_user() for checking whether a 
>>page struct is available or not. I think this is an irregular 
>>implementation and following patches
>>are a more generic replacement, careful_pfn_valid(). It uses 2 
>>level table.
> 
> 
> It is odd ... but a somewhat convenient way to make check whether
> the page struct exists, while handling the fault if it is in an
> area of virtual mem_map that doesn't exist.  I think that in practice
> we rarely call it with a pfn that generates a fault (except in error
> paths).

I understand it's rare case.
Honestly, this patch is for no-bitmap buddy allocator (I posted before).
pfn_valid() returns 0 in many case in no-bitmap buddy allocator
(because MAX_ORDER is 4GB).
So I decided to write experimental pfn_valid() which doesn't cause fault.


> How big will the pfn_validmap[] be for a very sparse physical space
> like SGI Altix?  I'm not sure I see how PFN_VALID_MAPSHIFT is 
> generated for each system.
> 
PFN_VALID_MAPSHIFT can be overwritten in each asm-xxx/page.h. (can be in config.h)
I think each special architecture can find suitable value, if it wants.
If Altrix has XXX Tbytes for each node, setting 1 cache line(64bytes=32entry) covers
each node's maximum size will be good.

1st level table.
With current configuration, 1Gbytes per 2byte, 8Tbytes per 1 page(16kpages)

2nd level table.
1 entry per 8 bytes. Entries are coalesced with each other as much as possible.
If memory layout is like a bee's nest, careful_pfn_valid() will need great amount
of memory and cannot work fine because of searching.


BTW, how sparse SGI Altix ?

> Why do we need a loop when looking in the 2nd level?  Can't the
> entry from the 1st level point us to the right place?
> 
consider this case.

a 1st level entry covers 0x1000 - 0x2000
[valid range          ]  0x1000 - 0x1100
                          0x1200 - 0x1500
                          0x1600 - 0x2000

pfn_valid(0x1501)
             -> by 1st level, we get 0x1000-0x1100
                              into loop  0x1200-0x1500
                                         0x1600-       returns 0.

walking 2nd level table can reduce size of 1st table.
I'd like to avoid cache-miss rather than avoiding small walk.


- Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [RFC/PATCH]  pfn_valid() more generic : intro[0/2]
@ 2004-10-06  6:33 Luck, Tony
  2004-10-06  7:33 ` Hiroyuki KAMEZAWA
  0 siblings, 1 reply; 3+ messages in thread
From: Luck, Tony @ 2004-10-06  6:33 UTC (permalink / raw)
  To: Hiroyuki KAMEZAWA, LinuxIA64; +Cc: linux-mm

>ia64's ia64_pfn_valid() uses get_user() for checking whether a 
>page struct is available or not. I think this is an irregular 
>implementation and following patches
>are a more generic replacement, careful_pfn_valid(). It uses 2 
>level table.

It is odd ... but a somewhat convenient way to make check whether
the page struct exists, while handling the fault if it is in an
area of virtual mem_map that doesn't exist.  I think that in practice
we rarely call it with a pfn that generates a fault (except in error
paths).

How big will the pfn_validmap[] be for a very sparse physical space
like SGI Altix?  I'm not sure I see how PFN_VALID_MAPSHIFT is 
generated for each system.

Why do we need a loop when looking in the 2nd level?  Can't the
entry from the 1st level point us to the right place?

-Tony
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-10-06  7:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-06  6:20 [RFC/PATCH] pfn_valid() more generic : intro[0/2] Hiroyuki KAMEZAWA
2004-10-06  6:33 Luck, Tony
2004-10-06  7:33 ` Hiroyuki KAMEZAWA

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox