* [RFC/PATCH] pfn_valid() more generic : intro[0/2]
@ 2004-10-06 6:20 Hiroyuki KAMEZAWA
0 siblings, 0 replies; 3+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-10-06 6:20 UTC (permalink / raw)
To: LinuxIA64; +Cc: linux-mm
Hi,
ia64's ia64_pfn_valid() uses get_user() for checking whether a page struct
is available or not. I think this is an irregular implementation and following patches
are a more generic replacement, careful_pfn_valid(). It uses 2 level table.
Core Algorithm
====
1st level, pfn_validmap[] has index to 2nd level table.
2nd level table is consists of (start, end) entries of valid pfns.
careful_pfn_valid(pfn)
-> pfn_validmap[(pfn >> PFN_VALID_MAPSHIFT)] == entry
if (entry == ALL_VALID) return 1
if (entry == ALL_INVALID) return 0
-> check 2nd level,
info = pfn_valid_info_table + entry.
while(info->start_pfn < pfn) {
if((info->start_pfn <= pfn) && (info->end_pfn > pfn))
return 0;
info++;
}
return 1;
====
sizeof(entry) is 2 bytes and each entry covers 1GB with current config(16k pages).
Here is kernbench results on my Tiger4 (Itanium2(1.3GHz) x2, 8 Gbytes memory),pagesize=16k
Average Optimal -j8 Load Run:
Elapsed Time User Time System Time Percent CPU C/Switch Sleeps
2.6.9-rc3 699.906 1322.01 39.336 194 64390 74416.8
2.6.9-rc3 + this_patch 698.478 1321.76 38.228 194 64502 74185
there are no difference :)
For NUMA, I think tables for careful_pfn_valid() should be copied to each node's local memory,
but I haven't implemented it yet.
-- Kame <kamezawa.hiroyu@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC/PATCH] pfn_valid() more generic : intro[0/2]
2004-10-06 6:33 Luck, Tony
@ 2004-10-06 7:33 ` Hiroyuki KAMEZAWA
0 siblings, 0 replies; 3+ messages in thread
From: Hiroyuki KAMEZAWA @ 2004-10-06 7:33 UTC (permalink / raw)
To: Luck, Tony; +Cc: LinuxIA64, linux-mm
Hi,
Luck, Tony wrote:
>>ia64's ia64_pfn_valid() uses get_user() for checking whether a
>>page struct is available or not. I think this is an irregular
>>implementation and following patches
>>are a more generic replacement, careful_pfn_valid(). It uses 2
>>level table.
>
>
> It is odd ... but a somewhat convenient way to make check whether
> the page struct exists, while handling the fault if it is in an
> area of virtual mem_map that doesn't exist. I think that in practice
> we rarely call it with a pfn that generates a fault (except in error
> paths).
I understand it's rare case.
Honestly, this patch is for no-bitmap buddy allocator (I posted before).
pfn_valid() returns 0 in many case in no-bitmap buddy allocator
(because MAX_ORDER is 4GB).
So I decided to write experimental pfn_valid() which doesn't cause fault.
> How big will the pfn_validmap[] be for a very sparse physical space
> like SGI Altix? I'm not sure I see how PFN_VALID_MAPSHIFT is
> generated for each system.
>
PFN_VALID_MAPSHIFT can be overwritten in each asm-xxx/page.h. (can be in config.h)
I think each special architecture can find suitable value, if it wants.
If Altrix has XXX Tbytes for each node, setting 1 cache line(64bytes=32entry) covers
each node's maximum size will be good.
1st level table.
With current configuration, 1Gbytes per 2byte, 8Tbytes per 1 page(16kpages)
2nd level table.
1 entry per 8 bytes. Entries are coalesced with each other as much as possible.
If memory layout is like a bee's nest, careful_pfn_valid() will need great amount
of memory and cannot work fine because of searching.
BTW, how sparse SGI Altix ?
> Why do we need a loop when looking in the 2nd level? Can't the
> entry from the 1st level point us to the right place?
>
consider this case.
a 1st level entry covers 0x1000 - 0x2000
[valid range ] 0x1000 - 0x1100
0x1200 - 0x1500
0x1600 - 0x2000
pfn_valid(0x1501)
-> by 1st level, we get 0x1000-0x1100
into loop 0x1200-0x1500
0x1600- returns 0.
walking 2nd level table can reduce size of 1st table.
I'd like to avoid cache-miss rather than avoiding small walk.
- Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: [RFC/PATCH] pfn_valid() more generic : intro[0/2]
@ 2004-10-06 6:33 Luck, Tony
2004-10-06 7:33 ` Hiroyuki KAMEZAWA
0 siblings, 1 reply; 3+ messages in thread
From: Luck, Tony @ 2004-10-06 6:33 UTC (permalink / raw)
To: Hiroyuki KAMEZAWA, LinuxIA64; +Cc: linux-mm
>ia64's ia64_pfn_valid() uses get_user() for checking whether a
>page struct is available or not. I think this is an irregular
>implementation and following patches
>are a more generic replacement, careful_pfn_valid(). It uses 2
>level table.
It is odd ... but a somewhat convenient way to make check whether
the page struct exists, while handling the fault if it is in an
area of virtual mem_map that doesn't exist. I think that in practice
we rarely call it with a pfn that generates a fault (except in error
paths).
How big will the pfn_validmap[] be for a very sparse physical space
like SGI Altix? I'm not sure I see how PFN_VALID_MAPSHIFT is
generated for each system.
Why do we need a loop when looking in the 2nd level? Can't the
entry from the 1st level point us to the right place?
-Tony
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-10-06 7:28 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-06 6:20 [RFC/PATCH] pfn_valid() more generic : intro[0/2] Hiroyuki KAMEZAWA
2004-10-06 6:33 Luck, Tony
2004-10-06 7:33 ` Hiroyuki KAMEZAWA
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox