* [PATCH] rmap 13a
@ 2002-05-07 2:17 Rik van Riel
2002-05-07 17:37 ` Christoph Hellwig
0 siblings, 1 reply; 23+ messages in thread
From: Rik van Riel @ 2002-05-07 2:17 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel
The first maintenance release of the 13th version of the reverse
mapping based VM is now available.
This is an attempt at making a more robust and flexible VM
subsystem, while cleaning up a lot of code at the same time.
The patch is available from:
http://surriel.com/patches/2.4/2.4.19p7-rmap-13a
and http://linuxvm.bkbits.net/
My big TODO items for a next release are:
- O(1) page launder - currently functional but slow, needs to be tuned
- pte-highmem (after marcelo has chosen which one for 2.4 ?)
rmap 13a:
- NUMA changes for page_address (Samuel Ortiz)
- replace vm.freepages with simpler kswapd_minfree (Christoph Hellwig)
rmap 13:
- rename touch_page to mark_page_accessed and uninline (Christoph Hellwig)
- NUMA bugfix for __alloc_pages (William Irwin)
- kill __find_page (Christoph Hellwig)
- make pte_chain_freelist per zone (William Irwin)
- protect pte_chains by per-page lock bit (William Irwin)
- minor code cleanups (me)
rmap 12i:
- slab cleanup (Christoph Hellwig)
- remove references to compiler.h from mm/* (me)
- move rmap to marcelo's bk tree (me)
- minor cleanups (me)
rmap 12h:
- hopefully fix OOM detection algorithm (me)
- drop pte quicklist in anticipation of pte-highmem (me)
- replace andrea's highmem emulation by ingo's one (me)
- improve rss limit checking (Nick Piggin)
rmap 12g:
- port to armv architecture (David Woodhouse)
- NUMA fix to zone_table initialisation (Samuel Ortiz)
- remove init_page_count (David Miller)
rmap 12f:
- for_each_pgdat macro (William Lee Irwin)
- put back EXPORT(__find_get_page) for modular rd (me)
- make bdflush and kswapd actually start queued disk IO (me)
rmap 12e
- RSS limit fix, the limit can be 0 for some reason (me)
- clean up for_each_zone define to not need pgdata_t (William Lee Irwin)
- fix i810_dma bug introduced with page->wait removal (William Lee Irwin)
rmap 12d:
- fix compiler warning in rmap.c (Roger Larsson)
- read latency improvement (read-latency2) (Andrew Morton)
rmap 12c:
- fix small balancing bug in page_launder_zone (Nick Piggin)
- wakeup_kswapd / wakeup_memwaiters code fix (Arjan van de Ven)
- improve RSS limit enforcement (me)
rmap 12b:
- highmem emulation (for debugging purposes) (Andrea Arcangeli)
- ulimit RSS enforcement when memory gets tight (me)
- sparc64 page->virtual quickfix (Greg Procunier)
rmap 12a:
- fix the compile warning in buffer.c (me)
- fix divide-by-zero on highmem initialisation DOH! (me)
- remove the pgd quicklist (suspicious ...) (DaveM, me)
rmap 12:
- keep some extra free memory on large machines (Arjan van de Ven, me)
- higher-order allocation bugfix (Adrian Drzewiecki)
- nr_free_buffer_pages() returns inactive + free mem (me)
- pages from unused objects directly to inactive_clean (me)
- use fast pte quicklists on non-pae machines (Andrea Arcangeli)
- remove sleep_on from wakeup_kswapd (Arjan van de Ven)
- page waitqueue cleanup (Christoph Hellwig)
rmap 11c:
- oom_kill race locking fix (Andres Salomon)
- elevator improvement (Andrew Morton)
- dirty buffer writeout speedup (hopefully ;)) (me)
- small documentation updates (me)
- page_launder() never does synchronous IO, kswapd
and the processes calling it sleep on higher level (me)
- deadlock fix in touch_page() (me)
rmap 11b:
- added low latency reschedule points in vmscan.c (me)
- make i810_dma.c include mm_inline.h too (William Lee Irwin)
- wake up kswapd sleeper tasks on OOM kill so the
killed task can continue on its way out (me)
- tune page allocation sleep point a little (me)
rmap 11a:
- don't let refill_inactive() progress count for OOM (me)
- after an OOM kill, wait 5 seconds for the next kill (me)
- agpgart_be fix for hashed waitqueues (William Lee Irwin)
rmap 11:
- fix stupid logic inversion bug in wakeup_kswapd() (Andrew Morton)
- fix it again in the morning (me)
- add #ifdef BROKEN_PPC_PTE_ALLOC_ONE to rmap.h, it
seems PPC calls pte_alloc() before mem_map[] init (me)
- disable the debugging code in rmap.c ... the code
is working and people are running benchmarks (me)
- let the slab cache shrink functions return a value
to help prevent early OOM killing (Ed Tomlinson)
- also, don't call the OOM code if we have enough
free pages (me)
- move the call to lru_cache_del into __free_pages_ok (Ben LaHaise)
- replace the per-page waitqueue with a hashed
waitqueue, reduces size of struct page from 64
bytes to 52 bytes (48 bytes on non-highmem machines) (William Lee Irwin)
rmap 10:
- fix the livelock for real (yeah right), turned out
to be a stupid bug in page_launder_zone() (me)
- to make sure the VM subsystem doesn't monopolise
the CPU, let kswapd and some apps sleep a bit under
heavy stress situations (me)
- let __GFP_HIGH allocations dig a little bit deeper
into the free page pool, the SCSI layer seems fragile (me)
rmap 9:
- improve comments all over the place (Michael Cohen)
- don't panic if page_remove_rmap() cannot find the
rmap in question, it's possible that the memory was
PG_reserved and belonging to a driver, but the driver
exited and cleared the PG_reserved bit (me)
- fix the VM livelock by replacing > by >= in a few
critical places in the pageout code (me)
- treat the reclaiming of an inactive_clean page like
allocating a new page, calling try_to_free_pages()
and/or fixup_freespace() if required (me)
- when low on memory, don't make things worse by
doing swapin_readahead (me)
rmap 8:
- add ANY_ZONE to the balancing functions to improve
kswapd's balancing a bit (me)
- regularize some of the maximum loop bounds in
vmscan.c for cosmetic purposes (William Lee Irwin)
- move page_address() to architecture-independent
code, now the removal of page->virtual is portable (William Lee Irwin)
- speed up free_area_init_core() by doing a single
pass over the pages and not using atomic ops (William Lee Irwin)
- documented the buddy allocator in page_alloc.c (William Lee Irwin)
rmap 7:
- clean up and document vmscan.c (me)
- reduce size of page struct, part one (William Lee Irwin)
- add rmap.h for other archs (untested, not for ARM) (me)
rmap 6:
- make the active and inactive_dirty list per zone,
this is finally possible because we can free pages
based on their physical address (William Lee Irwin)
- cleaned up William's code a bit (me)
- turn some defines into inlines and move those to
mm_inline.h (the includes are a mess ...) (me)
- improve the VM balancing a bit (me)
- add back inactive_target to /proc/meminfo (me)
rmap 5:
- fixed recursive buglet, introduced by directly
editing the patch for making rmap 4 ;))) (me)
rmap 4:
- look at the referenced bits in page tables (me)
rmap 3:
- forgot one FASTCALL definition (me)
rmap 2:
- teach try_to_unmap_one() about mremap() (me)
- don't assign swap space to pages with buffers (me)
- make the rmap.c functions FASTCALL / inline (me)
rmap 1:
- fix the swap leak in rmap 0 (Dave McCracken)
rmap 0:
- port of reverse mapping VM to 2.4.16 (me)
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-07 2:17 [PATCH] rmap 13a Rik van Riel
@ 2002-05-07 17:37 ` Christoph Hellwig
2002-05-07 18:03 ` William Lee Irwin III
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Christoph Hellwig @ 2002-05-07 17:37 UTC (permalink / raw)
To: Rik van Riel; +Cc: Samuel Ortiz, linux-mm
On Mon, May 06, 2002 at 11:17:26PM -0300, Rik van Riel wrote:
> rmap 13a:
> - NUMA changes for page_address (Samuel Ortiz)
I don't think the changes makes sense. If calculating page_address is
complicated and slow enough to place it out-of-lin using page->virtual
is much better.
I'd suggest backing this patch out and instead always maintain page->virtual
for discontigmem. While at this as a little cleanup you might want to
define WANT_PAGE_VIRTUAL based on CONFIG_HIGHMEM || CONFIG_DISCONTIGMEM
at the top of mm.h instead of cluttering it up.
Christoph
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-07 17:37 ` Christoph Hellwig
@ 2002-05-07 18:03 ` William Lee Irwin III
2002-05-08 11:06 ` Samuel Ortiz
2002-05-08 18:21 ` Roman Zippel
2 siblings, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2002-05-07 18:03 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Rik van Riel, Samuel Ortiz, linux-mm
On Mon, May 06, 2002 at 11:17:26PM -0300, Rik van Riel wrote:
>> rmap 13a:
>> - NUMA changes for page_address (Samuel Ortiz)
On Tue, May 07, 2002 at 06:37:41PM +0100, Christoph Hellwig wrote:
> I don't think the changes makes sense. If calculating page_address is
> complicated and slow enough to place it out-of-lin using page->virtual
> is much better.
On Tue, May 07, 2002 at 06:37:41PM +0100, Christoph Hellwig wrote:
> I'd suggest backing this patch out and instead always maintain page->virtual
> for discontigmem. While at this as a little cleanup you might want to
> define WANT_PAGE_VIRTUAL based on CONFIG_HIGHMEM || CONFIG_DISCONTIGMEM
> at the top of mm.h instead of cluttering it up.
> Christoph
This is a time/space tradeoff that may not necessarily be the case for
all discontiguous memory architectures. It seems to be so for SGI's
machines, though. I advocated this as a matter of generality, despite
not having a specific example of a machine that wants it. It's not
difficult to produce examples of small-memory architectures with
discontiguous memory, though SGI's discontigmem implementation does
not appear to be in widespread use for them.
Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-07 17:37 ` Christoph Hellwig
2002-05-07 18:03 ` William Lee Irwin III
@ 2002-05-08 11:06 ` Samuel Ortiz
2002-05-08 11:13 ` William Lee Irwin III
2002-05-08 13:40 ` Rik van Riel
2002-05-08 18:21 ` Roman Zippel
2 siblings, 2 replies; 23+ messages in thread
From: Samuel Ortiz @ 2002-05-08 11:06 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Rik van Riel, linux-mm
On Tue, 7 May 2002, Christoph Hellwig wrote:
> On Mon, May 06, 2002 at 11:17:26PM -0300, Rik van Riel wrote:
> > rmap 13a:
> > - NUMA changes for page_address (Samuel Ortiz)
>
> I don't think the changes makes sense. If calculating page_address is
> complicated and slow enough to place it out-of-lin using page->virtual
> is much better.
This is right for machines who don't care about the struct page size, like
SGI ones, and big NUMA machines in general.
> I'd suggest backing this patch out and instead always maintain page->virtual
> for discontigmem. While at this as a little cleanup you might want to
> define WANT_PAGE_VIRTUAL based on CONFIG_HIGHMEM || CONFIG_DISCONTIGMEM
> at the top of mm.h instead of cluttering it up.
Some discontiguous architectures (ARM, for example) may be interested in
getting rid of page->virtual, and thus shrinking the struct page size.
So you may want to get the possibility of having
(!CONFIG_HIGHMEM)&&CONFIG_DISCONTIGMEM and not wanting page->virtual.
So, WANT_PAGE_VIRTUAL can not be defined with CONFIG_HIGHMEM ||
CONFIG_DISCONTIGMEM.
However, I should modify my patch in order for the changes to take place
only if (!CONFIG_HIGHMEM)&&(CONFIG_DISCONTIG_MEM)&&(!WANT_PAGE_VIRTUAL).
I can come back with the right changes if that makes sense to you.
Cheers,
Samuel.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-08 11:06 ` Samuel Ortiz
@ 2002-05-08 11:13 ` William Lee Irwin III
2002-05-08 13:40 ` Rik van Riel
1 sibling, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2002-05-08 11:13 UTC (permalink / raw)
To: Samuel Ortiz; +Cc: Christoph Hellwig, Rik van Riel, linux-mm
On Wed, May 08, 2002 at 04:06:58AM -0700, Samuel Ortiz wrote:
> Some discontiguous architectures (ARM, for example) may be interested in
> getting rid of page->virtual, and thus shrinking the struct page size.
> So you may want to get the possibility of having
> (!CONFIG_HIGHMEM)&&CONFIG_DISCONTIGMEM and not wanting page->virtual.
> So, WANT_PAGE_VIRTUAL can not be defined with CONFIG_HIGHMEM ||
> CONFIG_DISCONTIGMEM.
> However, I should modify my patch in order for the changes to take place
> only if (!CONFIG_HIGHMEM)&&(CONFIG_DISCONTIG_MEM)&&(!WANT_PAGE_VIRTUAL).
> I can come back with the right changes if that makes sense to you.
Also, to be perfectly clear despite my message of perhaps extreme
conservatism regarding space conservation, I believe the time/space
tradeoff is an architectural consideration. Though I specifically
requested that a calculated UNMAP_NR_DENSE() be implemented, I by no
means oppose the usage of page->virtual for those architectures where
demonstrable performance benefits arise from the additional space
consumption of the extra field.
Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-08 11:06 ` Samuel Ortiz
2002-05-08 11:13 ` William Lee Irwin III
@ 2002-05-08 13:40 ` Rik van Riel
1 sibling, 0 replies; 23+ messages in thread
From: Rik van Riel @ 2002-05-08 13:40 UTC (permalink / raw)
To: Samuel Ortiz; +Cc: Christoph Hellwig, linux-mm
On Wed, 8 May 2002, Samuel Ortiz wrote:
> However, I should modify my patch in order for the changes to take place
> only if (!CONFIG_HIGHMEM)&&(CONFIG_DISCONTIG_MEM)&&(!WANT_PAGE_VIRTUAL).
> I can come back with the right changes if that makes sense to you.
Please read what went into rmap 13a ;)
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-07 17:37 ` Christoph Hellwig
2002-05-07 18:03 ` William Lee Irwin III
2002-05-08 11:06 ` Samuel Ortiz
@ 2002-05-08 18:21 ` Roman Zippel
2002-05-08 21:34 ` William Lee Irwin III
2002-05-08 21:50 ` William Lee Irwin III
2 siblings, 2 replies; 23+ messages in thread
From: Roman Zippel @ 2002-05-08 18:21 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Rik van Riel, Samuel Ortiz, linux-mm
Hi,
Christoph Hellwig wrote:
> > - NUMA changes for page_address (Samuel Ortiz)
>
> I don't think the changes makes sense. If calculating page_address is
> complicated and slow enough to place it out-of-lin using page->virtual
> is much better.
>
> I'd suggest backing this patch out and instead always maintain page->virtual
> for discontigmem. While at this as a little cleanup you might want to
> define WANT_PAGE_VIRTUAL based on CONFIG_HIGHMEM || CONFIG_DISCONTIGMEM
> at the top of mm.h instead of cluttering it up.
I'd suggest, we move page_address to asm/page.h (as counterpart of
virt_to_page). discontigmem configs can then use some more efficient
table lookup. Other config usually want to implement it better as:
#define page_address(page) ((((page) - mem_map) << PAGE_SHIFT) +
PAGE_OFFSET)
bye, Roman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-08 18:21 ` Roman Zippel
@ 2002-05-08 21:34 ` William Lee Irwin III
2002-05-08 22:34 ` Roman Zippel
2002-05-08 21:50 ` William Lee Irwin III
1 sibling, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2002-05-08 21:34 UTC (permalink / raw)
To: Roman Zippel; +Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
On Wed, May 08, 2002 at 08:21:37PM +0200, Roman Zippel wrote:
> I'd suggest, we move page_address to asm/page.h (as counterpart of
> virt_to_page). discontigmem configs can then use some more efficient
> table lookup. Other config usually want to implement it better as:
> #define page_address(page) ((((page) - mem_map) << PAGE_SHIFT) +
> PAGE_OFFSET)
> bye, Roman
Trouble is, there are only four different useful variations.
A:
static inline void *page_address(struct page *page)
{
return __va((page - mem_map) << PAGE_SHIFT);
}
B:
static inline void *page_address(struct page *page)
{
return page->virtual;
}
C:
static inline void *page_address(struct page *page)
{
zone_t *zone = page_zone(page);
return __va(((page - zone->zone_mem_map) << PAGE_SHIFT)
+ zone->zone_start_paddr);
}
D:
static inline void *page_address(struct page *page)
{
zone_t *zone = page_zone(page);
return __va((UNMAP_NR_DENSE(page - zone->zone_mem_map) << PAGE_SHIFT)
+ zone->zone_start_paddr);
}
Where A is fine without highmem or discontigmem, B is required any time
there is highmem, C is needed for discontiguous non-highmem, and D is
required for SGI-based discontigmem using MAP_NR_DENSE() to pack pages
from a discontiguous region into a single zone to avoid having mem_map
larger than the largest contiguous memory region or having too many zones
to be tractable. Also, C and D could be collapsed to one case if
UNMAP_NR_DENSE() is defined as an identity mapping for those not using it.
C and D are of course on the space conservation side of the time/space
tradeoff. (C is actually already needed on NUMA-Q.)
What I believe *really* needs to be straightened out here is how the
various architectures get to select their favorite variant.
Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-08 18:21 ` Roman Zippel
2002-05-08 21:34 ` William Lee Irwin III
@ 2002-05-08 21:50 ` William Lee Irwin III
1 sibling, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2002-05-08 21:50 UTC (permalink / raw)
To: Roman Zippel; +Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
On Wed, May 08, 2002 at 08:21:37PM +0200, Roman Zippel wrote:
> I'd suggest, we move page_address to asm/page.h (as counterpart of
> virt_to_page). discontigmem configs can then use some more efficient
> table lookup. Other config usually want to implement it better as:
> #define page_address(page) ((((page) - mem_map) << PAGE_SHIFT) +
> PAGE_OFFSET)
> bye, Roman
Sorry, I missed the part about table lookup.
If table lookup is wanted, I feel that should also be a generic option.
There is nothing inherently architecture-specific about using a table-
driven method of calculating page_address().
But why isn't zone_table[] already an instance of such a table?
An annotated description of the generic version is:
/* the table lookup */
zone = zone_table[page->flags >> ZONE_SHIFT]
/* the phys offset of the table entry */
__va(zone->zone_start_paddr
+
/* calculating the offset within the table region */ /* scaling the offset */
((page - zone->zone_mem_map) << PAGE_SHIFT))
Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-08 21:34 ` William Lee Irwin III
@ 2002-05-08 22:34 ` Roman Zippel
2002-05-08 22:42 ` William Lee Irwin III
0 siblings, 1 reply; 23+ messages in thread
From: Roman Zippel @ 2002-05-08 22:34 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
Hi,
William Lee Irwin III wrote:
> A:
> static inline void *page_address(struct page *page)
> {
> return __va((page - mem_map) << PAGE_SHIFT);
> }
This is very broken.
> If table lookup is wanted, I feel that should also be a generic option.
> There is nothing inherently architecture-specific about using a table-
> driven method of calculating page_address().
Archs already do the kaddr->node lookup. Archs setup the virtual mapping
and the pgdat nodes, they know best how they are layed out. Why do you
want to generalize this?
bye, Roman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-08 22:34 ` Roman Zippel
@ 2002-05-08 22:42 ` William Lee Irwin III
2002-05-08 22:50 ` William Lee Irwin III
2002-05-08 23:26 ` Roman Zippel
0 siblings, 2 replies; 23+ messages in thread
From: William Lee Irwin III @ 2002-05-08 22:42 UTC (permalink / raw)
To: Roman Zippel; +Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
William Lee Irwin III wrote:
>> A:
>> static inline void *page_address(struct page *page)
>> {
>> return __va((page - mem_map) << PAGE_SHIFT);
>> }
On Thu, May 09, 2002 at 12:34:34AM +0200, Roman Zippel wrote:
> This is very broken.
I beg your pardon? AFAICT it's equivalent to the macro you yourself
posted.
include/asm-i386/page.h:133:#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
It makes only 3 assumptions:
(1) memory is contiguous
(2) memory starts from 0
(3) mem_map is in 1:1 order-preserving correspondence with phys pages
William Lee Irwin III wrote:
>> If table lookup is wanted, I feel that should also be a generic option.
>> There is nothing inherently architecture-specific about using a table-
>> driven method of calculating page_address().
On Thu, May 09, 2002 at 12:34:34AM +0200, Roman Zippel wrote:
> Archs already do the kaddr->node lookup. Archs setup the virtual mapping
> and the pgdat nodes, they know best how they are layed out. Why do you
> want to generalize this?
Because they were doing it before and they all duplicated each others' code.
Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-08 22:42 ` William Lee Irwin III
@ 2002-05-08 22:50 ` William Lee Irwin III
2002-05-08 23:26 ` Roman Zippel
1 sibling, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2002-05-08 22:50 UTC (permalink / raw)
To: Roman Zippel, Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
On Wed, May 08, 2002 at 03:42:55PM -0700, William Lee Irwin III wrote:
> It makes only 3 assumptions:
> (1) memory is contiguous
> (2) memory starts from 0
> (3) mem_map is in 1:1 order-preserving correspondence with phys pages
(4) memory is direct-mapped.
Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-08 22:42 ` William Lee Irwin III
2002-05-08 22:50 ` William Lee Irwin III
@ 2002-05-08 23:26 ` Roman Zippel
2002-05-09 1:29 ` William Lee Irwin III
1 sibling, 1 reply; 23+ messages in thread
From: Roman Zippel @ 2002-05-08 23:26 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
Hi,
William Lee Irwin III wrote:
> > This is very broken.
>
> I beg your pardon? AFAICT it's equivalent to the macro you yourself
> posted.
>
> include/asm-i386/page.h:133:#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
>
> It makes only 3 assumptions:
> (1) memory is contiguous
> (2) memory starts from 0
> (3) mem_map is in 1:1 order-preserving correspondence with phys pages
You should not only look at the i386 code, if you want to create generic
functions.
> On Thu, May 09, 2002 at 12:34:34AM +0200, Roman Zippel wrote:
> > Archs already do the kaddr->node lookup. Archs setup the virtual mapping
> > and the pgdat nodes, they know best how they are layed out. Why do you
> > want to generalize this?
>
> Because they were doing it before and they all duplicated each others' code.
Table lookups can only be optimized if you know the memory layout and
only the archs know that.
Only the code for the simple case was copied.
bye, Roman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-08 23:26 ` Roman Zippel
@ 2002-05-09 1:29 ` William Lee Irwin III
2002-05-09 12:33 ` Roman Zippel
0 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2002-05-09 1:29 UTC (permalink / raw)
To: Roman Zippel; +Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
William Lee Irwin III wrote:
>> I beg your pardon? AFAICT it's equivalent to the macro you yourself
>> posted.
>> include/asm-i386/page.h:133:#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
>> It makes only 3 assumptions:
>> (1) memory is contiguous
>> (2) memory starts from 0
>> (3) mem_map is in 1:1 order-preserving correspondence with phys pages
On Thu, May 09, 2002 at 01:26:34AM +0200, Roman Zippel wrote:
> You should not only look at the i386 code, if you want to create generic
> functions.
It's not only i386. Other architectures are able to do likewise if
they satisfy the preconditions. And this is exactly one of four
variations, where all four together are able to handle all cases.
(In fact, just reverting to B works as a catch-all.) I am aware that
there are architectures who do not direct-map physical to virtual
within zones and they should either retain ->virtual or implement
UNMAP_NR_DENSE().
William Lee Irwin III wrote:
>> Because they were doing it before and they all duplicated each others' code.
On Thu, May 09, 2002 at 12:34:34AM +0200, Roman Zippel wrote:
> Table lookups can only be optimized if you know the memory layout and
> only the archs know that.
> Only the code for the simple case was copied.
The VM should informed of the memory layout by properly initialized
data structures...
There doesn't seem to be enough depth to this subject to merit this
much discussion. Are we speaking at cross-purposes? Since I wrote a
bit of this, is there an issue you're having you'd like me to address?
I have a sun3 that's booted Linux in the past, so I might be able to
reproduce m68k-specific issues that arise.
Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-09 1:29 ` William Lee Irwin III
@ 2002-05-09 12:33 ` Roman Zippel
2002-05-09 14:09 ` William Lee Irwin III
0 siblings, 1 reply; 23+ messages in thread
From: Roman Zippel @ 2002-05-09 12:33 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
Hi,
William Lee Irwin III wrote:
> > You should not only look at the i386 code, if you want to create generic
> > functions.
>
> It's not only i386. Other architectures are able to do likewise if
> they satisfy the preconditions. And this is exactly one of four
> variations, where all four together are able to handle all cases.
> (In fact, just reverting to B works as a catch-all.)
Your preconditions were no CONFIG_DISCONTIGMEM and no CONFIG_HIGHMEM.
This is true for m68k, but it still breaks every single of your
assumptions, but even on other archs where do these preconditions
require physical memory to start at 0?
> There doesn't seem to be enough depth to this subject to merit this
> much discussion. Are we speaking at cross-purposes? Since I wrote a
> bit of this, is there an issue you're having you'd like me to address?
> I have a sun3 that's booted Linux in the past, so I might be able to
> reproduce m68k-specific issues that arise.
It's really not m68k specific. You are trying to generalize a very small
part of the whole problem. First you only take some special cases (A.
and B.) and the rest was completely arch specific so far. You have to
define the complete model of how virtual and physical addresses and the
pgdat/index tuple relate to each other, before you can generalize
something of it. So far it was completely up to the archs to define this
relationship with only little assumptions from the generic code.
bye, Roman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-09 12:33 ` Roman Zippel
@ 2002-05-09 14:09 ` William Lee Irwin III
2002-05-09 15:36 ` Roman Zippel
0 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2002-05-09 14:09 UTC (permalink / raw)
To: Roman Zippel; +Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
William Lee Irwin III wrote:
>> It's not only i386. Other architectures are able to do likewise if
>> they satisfy the preconditions. And this is exactly one of four
>> variations, where all four together are able to handle all cases.
>> (In fact, just reverting to B works as a catch-all.)
On Thu, May 09, 2002 at 02:33:18PM +0200, Roman Zippel wrote:
> Your preconditions were no CONFIG_DISCONTIGMEM and no CONFIG_HIGHMEM.
> This is true for m68k, but it still breaks every single of your
> assumptions, but even on other archs where do these preconditions
> require physical memory to start at 0?
I stated starting at 0 as one of the preconditions for page - mem_map
based calculations. The only missing piece of information is the
starting address, which is not very difficult to take care of so as
to relax this invariant (compilers *should* optimize out 0). Hence,
static inline void *page_address(struct page *page)
{
return __va(MEM_START + ((page - mem_map) << PAGE_SHIFT));
}
Unfortunately there isn't a CONFIG_MEM_STARTS_AT_ZERO or a MEM_START.
To date I've gotten very few and/or very unclear responses from arch
maintainers. I'm very interested in getting info from arches such as
this but have had approximately zero input along this front thus far.
Do you have any suggestions as to a decent course of action here? My
first thought is a small set of #defines in include/asm-*/param.h or
some appropriate arch header.
On Thu, May 09, 2002 at 02:33:18PM +0200, Roman Zippel wrote:
> It's really not m68k specific. You are trying to generalize a very small
> part of the whole problem. First you only take some special cases (A.
> and B.) and the rest was completely arch specific so far. You have to
> define the complete model of how virtual and physical addresses and the
> pgdat/index tuple relate to each other, before you can generalize
> something of it. So far it was completely up to the archs to define this
> relationship with only little assumptions from the generic code.
I think you're overestimating how much there is to do here. It is either
inefficient to calculate the address due to the deep arch issues (e.g.
a low-level virtual remapping to crossdress ridiculously discontiguous
memory maps) or the invariants with zones and/or mem_map make it easy.
The only instances in which a zone is not a physically contiguous range
of memory with a corresponding contiguous range of virtual memory to
map it are CONFIG_HIGHMEM, SGI's CONFIG_DISCONTIGMEM, and architectures
with memory so discontiguous they remap everything for virtual contiguity.
The general CONFIG_DISCONTIGMEM case does not distinguish between
MAP_NR_DENSE() being present and/or an identity mapping and MAP_NR_DENSE()
being there and doing something, so I'm missing information on high-end
machines as well.
I believe the real issue is that architectures don't yet export enough
information to select the version they really want. There just aren't
enough variations on this theme to warrant doing this in arch code, and
I think I got some consensus on that by the mere acceptance of the
patch to remove ->virtual in most instances.
If we can agree on this much then can we start pinning down a more
precise method of selecting the method of address calculation? I'm
very interested in maintaining this code and making it suitable for
all architectures.
Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-09 14:09 ` William Lee Irwin III
@ 2002-05-09 15:36 ` Roman Zippel
2002-05-09 17:42 ` William Lee Irwin III
0 siblings, 1 reply; 23+ messages in thread
From: Roman Zippel @ 2002-05-09 15:36 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
Hi,
William Lee Irwin III wrote:
> I stated starting at 0 as one of the preconditions for page - mem_map
> based calculations. The only missing piece of information is the
> starting address, which is not very difficult to take care of so as
> to relax this invariant (compilers *should* optimize out 0). Hence,
>
> static inline void *page_address(struct page *page)
> {
> return __va(MEM_START + ((page - mem_map) << PAGE_SHIFT));
> }
There is no generic MEM_START, but there is PAGE_OFFSET. Why don't you
want to use this instead?
> Unfortunately there isn't a CONFIG_MEM_STARTS_AT_ZERO or a MEM_START.
And it's not needed, why should the vm care about the physical memory
location?
> To date I've gotten very few and/or very unclear responses from arch
> maintainers. I'm very interested in getting info from arches such as
> this but have had approximately zero input along this front thus far.
> Do you have any suggestions as to a decent course of action here? My
> first thought is a small set of #defines in include/asm-*/param.h or
> some appropriate arch header.
What do you want to define there?
> I believe the real issue is that architectures don't yet export enough
> information to select the version they really want. There just aren't
> enough variations on this theme to warrant doing this in arch code, and
> I think I got some consensus on that by the mere acceptance of the
> patch to remove ->virtual in most instances.
The basic mechanism is often the same, that's true. The problem is to
allow the archs an efficient conversion. Only the arch specific code
knows how the memory is laid out and can use this information to
optimize the conversion at compile time by making some of the variables
constant. As soon as you have managed to generalize this, I'm sure
highmem will be the only special case left you have to deal with.
> If we can agree on this much then can we start pinning down a more
> precise method of selecting the method of address calculation? I'm
> very interested in maintaining this code and making it suitable for
> all architectures.
Why is it so important to move it out of the arch code? The simple case
is trivial enough to be copied around or maybe put some templates into
asm-generic, but I'd prefer to leave the archs complete control about
this.
bye, Roman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-09 15:36 ` Roman Zippel
@ 2002-05-09 17:42 ` William Lee Irwin III
2002-05-09 21:45 ` Roman Zippel
0 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2002-05-09 17:42 UTC (permalink / raw)
To: Roman Zippel; +Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
William Lee Irwin III wrote:
>> I stated starting at 0 as one of the preconditions for page - mem_map
>> based calculations. The only missing piece of information is the
>> starting address, which is not very difficult to take care of so as
>> to relax this invariant (compilers *should* optimize out 0). Hence,
>> static inline void *page_address(struct page *page)
>> {
>> return __va(MEM_START + ((page - mem_map) << PAGE_SHIFT));
>> }
On Thu, May 09, 2002 at 05:36:22PM +0200, Roman Zippel wrote:
> There is no generic MEM_START, but there is PAGE_OFFSET. Why don't you
> want to use this instead?
MEM_START would be the lowest physical address, not the lowest virtual.
__va(PAGE_OFFSET + ((page - mem_map) << PAGE_SHIFT)) would yield
garbage... Perhaps __pa(PAGE_OFFSET) would work while relaxing only the
"memory starts at 0" precondition? It should come out to 0 for the
architectures with memory starting at 0, and the other preconditions
guarantee that it's the lowest physical, but that breaks if the lowest
physical isn't mapped to PAGE_OFFSET...
William Lee Irwin III wrote:
>> Unfortunately there isn't a CONFIG_MEM_STARTS_AT_ZERO or a MEM_START.
On Thu, May 09, 2002 at 05:36:22PM +0200, Roman Zippel wrote:
> And it's not needed, why should the vm care about the physical memory
> location?
VM is about translating virtual to physical, and so it must know
something resembling the physical address of a page just to edit PTE's?
William Lee Irwin III wrote:
>> To date I've gotten very few and/or very unclear responses from arch
>> maintainers. I'm very interested in getting info from arches such as
>> this but have had approximately zero input along this front thus far.
>> Do you have any suggestions as to a decent course of action here? My
>> first thought is a small set of #defines in include/asm-*/param.h or
>> some appropriate arch header.
On Thu, May 09, 2002 at 05:36:22PM +0200, Roman Zippel wrote:
> What do you want to define there?
(1) ARCH_SLOW_ALU
Address calculation doesn't win on some CPU's with ridiculously
slow ALU's (well, slow relative to memory fetch)
(2) ARCH_DENSEMAPS_ZONES (probably better to define it elsewhere)
MAP_NR_DENSE() makes address calculation schemes somewhat more
expensive
(3) ARCH_REMAPS_DISCONTIG
__va() and __pa() may do strange things when physical memory is
mapped in a strange ways for virtual contiguity, don't mess
with these guys
(4) ARCH_TRIVIAL_MEMORY_LAYOUT
We're the super-easy case that nobody complains about =)
William Lee Irwin III wrote:
>> I believe the real issue is that architectures don't yet export enough
>> information to select the version they really want. There just aren't
>> enough variations on this theme to warrant doing this in arch code, and
>> I think I got some consensus on that by the mere acceptance of the
>> patch to remove ->virtual in most instances.
On Thu, May 09, 2002 at 05:36:22PM +0200, Roman Zippel wrote:
> The basic mechanism is often the same, that's true. The problem is to
> allow the archs an efficient conversion. Only the arch specific code
> knows how the memory is laid out and can use this information to
> optimize the conversion at compile time by making some of the variables
> constant. As soon as you have managed to generalize this, I'm sure
> highmem will be the only special case left you have to deal with.
This sounds like an important direction I should investigate then. I've
already got a notion of how highmem could be dealt with (c.f. indexing
into kmap pool post). I'll have to dig around to see what architectures
export and see what needs to be standardized for an address calculation
scheme based on constructing the address with the help of those constants.
On Thu, May 09, 2002 at 05:36:22PM +0200, Roman Zippel wrote:
>> If we can agree on this much then can we start pinning down a more
>> precise method of selecting the method of address calculation? I'm
>> very interested in maintaining this code and making it suitable for
>> all architectures.
On Thu, May 09, 2002 at 05:36:22PM +0200, Roman Zippel wrote:
> Why is it so important to move it out of the arch code? The simple case
> is trivial enough to be copied around or maybe put some templates into
> asm-generic, but I'd prefer to leave the archs complete control about
> this.
mem_map was bloated to the point where highmem machines couldn't get
enough ZONE_NORMAL to boot. Control definitely needs to be exerted over
the space consumption of mem_map, and killing ->virtual wherever/whenever
possible is a large part of that. I even have some evidence that what
control of that space consumption there is now may still be insufficient.
asm-generic has the small problem that it divorces the format of struct
page from the definition of page_address(). Also, it shouldn't really be
optional; the VM should be given enough information to do the right thing.
Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-09 17:42 ` William Lee Irwin III
@ 2002-05-09 21:45 ` Roman Zippel
2002-05-09 23:13 ` William Lee Irwin III
0 siblings, 1 reply; 23+ messages in thread
From: Roman Zippel @ 2002-05-09 21:45 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
Hi,
William Lee Irwin III wrote:
> MEM_START would be the lowest physical address, not the lowest virtual.
> __va(PAGE_OFFSET + ((page - mem_map) << PAGE_SHIFT)) would yield
> garbage... Perhaps __pa(PAGE_OFFSET) would work while relaxing only the
> "memory starts at 0" precondition? It should come out to 0 for the
> architectures with memory starting at 0, and the other preconditions
> guarantee that it's the lowest physical, but that breaks if the lowest
> physical isn't mapped to PAGE_OFFSET...
Sigh...
Why do you insist on calculating with the physical address? PAGE_OFFSET
is already a virtual address and without CONFIG_DISCONTIGMEM the page at
PAGE_OFFSET always corresponds to memmap[0] (on every arch).
> > And it's not needed, why should the vm care about the physical memory
> > location?
>
> VM is about translating virtual to physical, and so it must know
> something resembling the physical address of a page just to edit PTE's?
The vm doesn't manage the memory based on the physical address, so all
it needs are functions to convert to/from a physical address, it doesn't
care about the value itself.
I scratch the rest of mail and try describe the more general problem.
We have three possibilities to address a memory page: virtual address,
physical address and pgdat+index. In the simplest case we can map all
them linear for continuos memory configuration. Otherwise the mapping
between physical address and pgdat+index will always involve some lookup
mechanism. It's desirable to have at least one linear mapping, so the
virtual mapping should be aligned either to the physical address space
or the pgdat array(s). m68k does the latter, everyone else the first.
Now the archs or your general code has to provide mappings between every
address space, so we have now:
- virt_to_phys()/phys_to_virt()
- pfn_to_page()/page_to_pfn()
- virt_to_page()/page_to_virt^Wpage_addr()
Please take a look at asm-ppc/page.h:__va()/__pa(). Here you have an
example that even for linear mappings, we use some tricks to optimize
this. How do you want to generalize this? So every arch specifies how to
map between the address spaces and provides special functions to do the
mapping, what is now left for the generic code?
BTW 5 out of the 6 functions are currently defined by the archs, what
makes the 6th so special?
Highmem is the only special case, that can be handled by generic code,
because the basic problem is on every arch the same.
bye, Roman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-09 21:45 ` Roman Zippel
@ 2002-05-09 23:13 ` William Lee Irwin III
2002-05-10 11:37 ` Roman Zippel
0 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2002-05-09 23:13 UTC (permalink / raw)
To: Roman Zippel; +Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
On Thu, May 09, 2002 at 11:45:04PM +0200, Roman Zippel wrote:
> I scratch the rest of mail and try describe the more general problem.
> We have three possibilities to address a memory page: virtual address,
> physical address and pgdat+index. In the simplest case we can map all
> them linear for continuos memory configuration. Otherwise the mapping
> between physical address and pgdat+index will always involve some lookup
> mechanism. It's desirable to have at least one linear mapping, so the
> virtual mapping should be aligned either to the physical address space
> or the pgdat array(s). m68k does the latter, everyone else the first.
I'm not entirely sure what you mean by "aligned to the pgdat array(s)".
I've poked around m68k arch code (esp. since sun3 is not the most
supported of the m68k platforms), and I wouldn't describe anything I
saw down there in those terms. Could you clarify this somewhat?
On Thu, May 09, 2002 at 11:45:04PM +0200, Roman Zippel wrote:
> Now the archs or your general code has to provide mappings between every
> address space, so we have now:
> - virt_to_phys()/phys_to_virt()
> - pfn_to_page()/page_to_pfn()
> - virt_to_page()/page_to_virt^Wpage_addr()
> Please take a look at asm-ppc/page.h:__va()/__pa(). Here you have an
> example that even for linear mappings, we use some tricks to optimize
> this. How do you want to generalize this? So every arch specifies how to
> map between the address spaces and provides special functions to do the
> mapping, what is now left for the generic code?
It seems reasonable to expect __va()/__pa() to come from arch code...
Maybe a more compelling example might be some of the trickery you
have in mind for optimizing page_address() on a per-arch basis? I'd
be very interested in seeing a bit of that, and it might give me
something to hold on to since I certainly saw nothing like that when
I genericized it. Believe it or not I'm willing to be convinced, I'm
just not going to change my mind without due cause. Also, why is it
attracting your attention? Is it creating significant overhead for you?
On Thu, May 09, 2002 at 11:45:04PM +0200, Roman Zippel wrote:
> BTW 5 out of the 6 functions are currently defined by the archs, what
> makes the 6th so special?
> Highmem is the only special case, that can be handled by generic code,
> because the basic problem is on every arch the same.
struct page is generic. That, and it consumes a great deal of memory,
and so must be closely controlled. I'm very intent on calculating
page_address() especially on those machines the majority of whose
kernel virtual address spaces are now consumed chiefly by mem_map.
Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-09 23:13 ` William Lee Irwin III
@ 2002-05-10 11:37 ` Roman Zippel
2002-05-10 16:28 ` William Lee Irwin III
0 siblings, 1 reply; 23+ messages in thread
From: Roman Zippel @ 2002-05-10 11:37 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
Hi,
On Thu, 9 May 2002, William Lee Irwin III wrote:
> > We have three possibilities to address a memory page: virtual address,
> > physical address and pgdat+index. In the simplest case we can map all
> > them linear for continuos memory configuration. Otherwise the mapping
> > between physical address and pgdat+index will always involve some lookup
> > mechanism. It's desirable to have at least one linear mapping, so the
> > virtual mapping should be aligned either to the physical address space
> > or the pgdat array(s). m68k does the latter, everyone else the first.
>
> I'm not entirely sure what you mean by "aligned to the pgdat array(s)".
Mapping everything into a single virtual area, so that the virtual address
can be used as a index in the memmap array, e.g.
#define virt_to_page(kaddr) (mem_map + (((unsigned long)(kaddr)-PAGE_OFFSET) >> PAGE_SHIFT))
#define page_to_virt(page) ((((page) - mem_map) << PAGE_SHIFT) + PAGE_OFFSET)
> I've poked around m68k arch code (esp. since sun3 is not the most
> supported of the m68k platforms), and I wouldn't describe anything I
> saw down there in those terms. Could you clarify this somewhat?
sun3 doesn't has to deal with discontinuous memory, so you have to look at
the motorola code.
> It seems reasonable to expect __va()/__pa() to come from arch code...
But you cannot seperate this, all the conversion functions relate to each
other.
> Maybe a more compelling example might be some of the trickery you
> have in mind for optimizing page_address() on a per-arch basis? I'd
> be very interested in seeing a bit of that, and it might give me
> something to hold on to since I certainly saw nothing like that when
> I genericized it. Believe it or not I'm willing to be convinced, I'm
> just not going to change my mind without due cause.
A generic conversion function could look like:
table[(addr >> shift) & mask] + addr;
You have here three possible variables: table, shift and mask. If you know
enough about the memory configuration, you can make them constants. On
m68k I maybe can make table and mask constants, the shift had to be
patched into the kernel. In this case it's quite simple, as it has to be
loaded into a register anyway this is enough:
static inline int getshift(void) __attribute__ ((const));
#define shift getshift()
In the ppc example I mentioned it's not that easy, because the instruction
has to be patched, which does the operation, so the generic operation:
#define ___pa(vaddr) ((vaddr)-PPC_MEMOFFSET)
becomes
#define ___pa(vaddr) (ADD(vaddr, PPC_MEMOFFSET))
ADD() would do the magic you see in asm-ppc/page.h.
For the lookup function above this means it becomes:
TABLE(SHIFT_AND(addr, shift, mask)) + addr
so that every operation could be directly patched.
> Also, why is it
> attracting your attention? Is it creating significant overhead for you?
The current page_addr started it.
IMO it's better to just define it as:
#ifdef CONFIG_HIGHMEM
#define page_addr(p) ((p)->virtual)
#else
#define page_addr(p) page_to_virt(p)
#endif
or if you don't want the virtual member:
#define page_addr(p) (is_highpage(p) ? highpage_to_virt(p) : page_to_virt(p))
If I understand you correctly, the highpage_to_virt() function is what you
are really interested in.
bye, Roman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-10 11:37 ` Roman Zippel
@ 2002-05-10 16:28 ` William Lee Irwin III
2002-05-10 19:48 ` Roman Zippel
0 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2002-05-10 16:28 UTC (permalink / raw)
To: Roman Zippel; +Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
On Fri, May 10, 2002 at 01:37:50PM +0200, Roman Zippel wrote:
> Mapping everything into a single virtual area, so that the virtual address
> can be used as a index in the memmap array, e.g.
> #define virt_to_page(kaddr) (mem_map + (((unsigned long)(kaddr)-PAGE_OFFSET) >> PAGE_SHIFT))
> #define page_to_virt(page) ((((page) - mem_map) << PAGE_SHIFT) + PAGE_OFFSET)
This appears to be calculating it from a physical address...
On Thu, 9 May 2002, William Lee Irwin III wrote:
>> It seems reasonable to expect __va()/__pa() to come from arch code...
On Fri, May 10, 2002 at 01:37:50PM +0200, Roman Zippel wrote:
> A generic conversion function could look like:
> table[(addr >> shift) & mask] + addr;
> You have here three possible variables: table, shift and mask. If you know
> enough about the memory configuration, you can make them constants. On
> m68k I maybe can make table and mask constants, the shift had to be
> patched into the kernel. In this case it's quite simple, as it has to be
> loaded into a register anyway this is enough:
> static inline int getshift(void) __attribute__ ((const));
> #define shift getshift()
> In the ppc example I mentioned it's not that easy, because the instruction
> has to be patched, which does the operation, so the generic operation:
> #define ___pa(vaddr) ((vaddr)-PPC_MEMOFFSET)
> becomes
> #define ___pa(vaddr) (ADD(vaddr, PPC_MEMOFFSET))
> ADD() would do the magic you see in asm-ppc/page.h.
> For the lookup function above this means it becomes:
> TABLE(SHIFT_AND(addr, shift, mask)) + addr
> so that every operation could be directly patched.
This is the most interesting part, and appears very easy to genericize;
I can produce this in short order unless you have a particular interest
in doing it yourself (or have a patch waiting in the wings already).
On Thu, 9 May 2002, William Lee Irwin III wrote:
>> Also, why is it
>> attracting your attention? Is it creating significant overhead for you?
On Fri, May 10, 2002 at 01:37:50PM +0200, Roman Zippel wrote:
> The current page_addr started it.
This seems begs the question.
On Fri, May 10, 2002 at 01:37:50PM +0200, Roman Zippel wrote:
> IMO it's better to just define it as:
> #ifdef CONFIG_HIGHMEM
> #define page_addr(p) ((p)->virtual)
> #else
> #define page_addr(p) page_to_virt(p)
> #endif
Highmem machines are the ones needing the space conservation the most,
yet they're the only ones who aren't allowed to omit ->virtual. There
is some irony here...
On Fri, May 10, 2002 at 01:37:50PM +0200, Roman Zippel wrote:
> or if you don't want the virtual member:
> #define page_addr(p) (is_highpage(p) ? highpage_to_virt(p) : page_to_virt(p))
> If I understand you correctly, the highpage_to_virt() function is what you
> are really interested in.
Not entirely. I'm very much in favor of space conservation even beyond
the particular interest of i386 highmem. I would like to kill ->virtual
entirely except as a very rarely used helper for superslow ALU's.
Maybe I should turn the question around instead, so I understand your
motivation better:
Why are you trying to hide physical addresses from the VM?
Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] rmap 13a
2002-05-10 16:28 ` William Lee Irwin III
@ 2002-05-10 19:48 ` Roman Zippel
0 siblings, 0 replies; 23+ messages in thread
From: Roman Zippel @ 2002-05-10 19:48 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Christoph Hellwig, Rik van Riel, Samuel Ortiz, linux-mm
Hi,
William Lee Irwin III wrote:
> > Mapping everything into a single virtual area, so that the virtual address
> > can be used as a index in the memmap array, e.g.
> > #define virt_to_page(kaddr) (mem_map + (((unsigned long)(kaddr)-PAGE_OFFSET) >> PAGE_SHIFT))
> > #define page_to_virt(page) ((((page) - mem_map) << PAGE_SHIFT) + PAGE_OFFSET)
>
> This appears to be calculating it from a physical address...
Why?
> > For the lookup function above this means it becomes:
> > TABLE(SHIFT_AND(addr, shift, mask)) + addr
> > so that every operation could be directly patched.
>
> This is the most interesting part, and appears very easy to genericize;
> I can produce this in short order unless you have a particular interest
> in doing it yourself (or have a patch waiting in the wings already).
It obfuscates the thing more than it helps. Only very few machines
really need it to this extreme.
> Maybe I should turn the question around instead, so I understand your
> motivation better:
> Why are you trying to hide physical addresses from the VM?
Because it doesn't need it. The VM works mostly with the page structure
and converts that as needed. It doesn't need to know how it's done.
Currently there are only few dependencies here, which gives us much
flexibility. I'm just afraid that by your generalization you create some
new rules how something has to be implemented. Sometimes that is needed,
but we should only do this if we gain a real advantage from it and I
don't see any.
bye, Roman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2002-05-10 19:48 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-07 2:17 [PATCH] rmap 13a Rik van Riel
2002-05-07 17:37 ` Christoph Hellwig
2002-05-07 18:03 ` William Lee Irwin III
2002-05-08 11:06 ` Samuel Ortiz
2002-05-08 11:13 ` William Lee Irwin III
2002-05-08 13:40 ` Rik van Riel
2002-05-08 18:21 ` Roman Zippel
2002-05-08 21:34 ` William Lee Irwin III
2002-05-08 22:34 ` Roman Zippel
2002-05-08 22:42 ` William Lee Irwin III
2002-05-08 22:50 ` William Lee Irwin III
2002-05-08 23:26 ` Roman Zippel
2002-05-09 1:29 ` William Lee Irwin III
2002-05-09 12:33 ` Roman Zippel
2002-05-09 14:09 ` William Lee Irwin III
2002-05-09 15:36 ` Roman Zippel
2002-05-09 17:42 ` William Lee Irwin III
2002-05-09 21:45 ` Roman Zippel
2002-05-09 23:13 ` William Lee Irwin III
2002-05-10 11:37 ` Roman Zippel
2002-05-10 16:28 ` William Lee Irwin III
2002-05-10 19:48 ` Roman Zippel
2002-05-08 21:50 ` William Lee Irwin III
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox