linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* pte_pagenr/MAP_NR deleted in pre6
@ 2000-08-10 17:18 Kanoj Sarcar
  2000-08-11  2:24 ` David S. Miller
                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Kanoj Sarcar @ 2000-08-10 17:18 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: rmk, nico, davem, davidm, alan

Thought I would send out a quick note about a change I put into test6.
Basically, to make it easier to implement DISCONTIGMEM systems, the
concepts of page/mem_map number/index has been killed from the generic
(non architecture specific) parts of the kernel. This includes MAP_NR,
pte_pagenr and max_mapnr (although max_mapnr is used by a lot of 
architectures, it is not used by the generic kernel anymore).

New macros that have been born to replace the above ones are 
virt_to_page (thusly named by Linus!), which will take a kernel direct
mapped address as input and provide the corresponding struct page. The
other one is VALID_PAGE(), which given a page struct, determines whether
it is a valid page struct and represents _physical_ memory.   

Both of virt_to_page and VALID_PAGE are in include/asm*/page.h. I have 
tried to make sure there were no mistakes when making the changes for
the various architectures, but I am sure I goofed up a few cases, so 
apologies in advance. 

Also, as I have suggested before, the pte_page implementation in
sparc/sparc64 should be cleaned up, and the usages of MAP_NR in the
arm code. Russell, Linus has not put in the final patch that will 
allow DISCONTIGMEM systems to lay out their mem_map arrays however
they see fit, I have resent it to him, if that is put in, we can get
down to simplifying most of the DISCONTIG arch code.

Thanks.

Kanoj

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-10 17:18 pte_pagenr/MAP_NR deleted in pre6 Kanoj Sarcar
@ 2000-08-11  2:24 ` David S. Miller
  2000-08-14  0:34   ` Anton Blanchard
  2000-08-11 11:50 ` Roman Zippel
  2000-08-15 16:19 ` Stephen C. Tweedie
  2 siblings, 1 reply; 36+ messages in thread
From: David S. Miller @ 2000-08-11  2:24 UTC (permalink / raw)
  To: kanoj; +Cc: linux-mm, linux-kernel, rmk, nico, davidm, alan

   Also, as I have suggested before, the pte_page implementation in
   sparc/sparc64 should be cleaned up

I took care of sparc64 and have asked Anton to deal with sparc32.

Later,
David S. Miller
davem@redhat.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-10 17:18 pte_pagenr/MAP_NR deleted in pre6 Kanoj Sarcar
  2000-08-11  2:24 ` David S. Miller
@ 2000-08-11 11:50 ` Roman Zippel
  2000-08-11 13:20   ` Russell King
  2000-08-15 16:19 ` Stephen C. Tweedie
  2 siblings, 1 reply; 36+ messages in thread
From: Roman Zippel @ 2000-08-11 11:50 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: linux-mm, linux-kernel

Hi,

> Also, as I have suggested before, the pte_page implementation in
> sparc/sparc64 should be cleaned up, and the usages of MAP_NR in the
> arm code. Russell, Linus has not put in the final patch that will
> allow DISCONTIGMEM systems to lay out their mem_map arrays however
> they see fit, I have resent it to him, if that is put in, we can get
> down to simplifying most of the DISCONTIG arch code.

Can you send me that patch? I'd like to check it, if it can be used for
the m68k port. m68k still has its own support for discontinous mem and
from what I've seen so far I'm not really convinced yet to give it up.
Small summary: m68k maps everything together into one virtual mapping
and uses the virtual address as index into memmap. That has the
advantage, that the address conversion stuff is concentrated in
__va/__pa and the rest stays simple (e.g. we don't have to deal with
multiple nodes). The only problem is that the generic code must not
assume that a mem zone is a physically continuos area (what is mostly
true, there are currently only two places, that are easy to fix).

bye, Roman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-11 11:50 ` Roman Zippel
@ 2000-08-11 13:20   ` Russell King
  2000-08-11 14:56     ` Roman Zippel
  2000-08-11 17:21     ` Kanoj Sarcar
  0 siblings, 2 replies; 36+ messages in thread
From: Russell King @ 2000-08-11 13:20 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Kanoj Sarcar, linux-mm, linux-kernel

Roman Zippel writes:
> Can you send me that patch? I'd like to check it, if it can be used for
> the m68k port. m68k still has its own support for discontinous mem and
> from what I've seen so far I'm not really convinced yet to give it up.

I don't see anything wrong in continuing with this.  ARM also does
this in addition to support for the discontig mem stuff.  Why?

The generial discontig code is ok so long as you have a lot of RAM
in node 0.  However, since all allocations currently come from node
0, if this node is small, then there is a chance that you will run
out of memory at bootup, and then not be able to continue (and
because we both use fbcon, there is no message visible to the user,
and hence no diagnostics).

Continuing with the single node but many "areas" that ARM follows, and
from what it sounds like m68k does, means that you can allocate from
any "area", and therefore don't hit this restriction.

One way out of this would be if the NUMA stuff can have the "allocations
only from node 0" feature turned off, and then I'd be happy to let the
ARM version be replaced totally by the discontig case.
   _____
  |_____| ------------------------------------------------- ---+---+-
  |   |         Russell King        rmk@arm.linux.org.uk      --- ---
  | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
  | +-+-+                                                     --- -+-
  /   |               THE developer of ARM Linux              |+| /|\
 /  | | |                                                     ---  |
    +-+-+ -------------------------------------------------  /\\\  |
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-11 13:20   ` Russell King
@ 2000-08-11 14:56     ` Roman Zippel
  2000-08-12  9:18       ` Bjorn Wesen
  2000-08-11 17:21     ` Kanoj Sarcar
  1 sibling, 1 reply; 36+ messages in thread
From: Roman Zippel @ 2000-08-11 14:56 UTC (permalink / raw)
  To: Russell King; +Cc: Kanoj Sarcar, linux-mm, linux-kernel

Russell King wrote:

> > Can you send me that patch? I'd like to check it, if it can be used for
> > the m68k port. m68k still has its own support for discontinous mem and
> > from what I've seen so far I'm not really convinced yet to give it up.
> 
> I don't see anything wrong in continuing with this.  ARM also does
> this in addition to support for the discontig mem stuff.  Why?

My problem is that I'm not really familiar with the high memory support.
The problem here is that the relation between virtual address / physical
address / page struct / memmap+index is hardly documented and it gets
more interesting when a page struct might also represent an i/o area
(for direct i/o).

> The generial discontig code is ok so long as you have a lot of RAM
> in node 0.  However, since all allocations currently come from node
> 0, if this node is small, then there is a chance that you will run
> out of memory at bootup, and then not be able to continue (and
> because we both use fbcon, there is no message visible to the user,
> and hence no diagnostics).

Another problem on m68k: I can make almost no assumption about the
memory layout to play some clever tricks. If I remember correctly I had
some problems with the memmap layout, since lots of code assumed a
continuos memmap and there were some tricks to get the above
relationship right.

bye, Roman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-11 13:20   ` Russell King
  2000-08-11 14:56     ` Roman Zippel
@ 2000-08-11 17:21     ` Kanoj Sarcar
  2000-08-14  9:29       ` Roman Zippel
  1 sibling, 1 reply; 36+ messages in thread
From: Kanoj Sarcar @ 2000-08-11 17:21 UTC (permalink / raw)
  To: Russell King; +Cc: Roman Zippel, linux-mm, linux-kernel

> 
> Roman Zippel writes:
> > Can you send me that patch? I'd like to check it, if it can be used for

http://oss.sgi.com/projects/numa/download/map.patch

And even if it doesn't help m68k, it definitely will help mips64, ia64 
and ARM (from what I am understanding from Russell). So, unless it is
_breaking_ m68k, I would rather see the patch go in ...

> > the m68k port. m68k still has its own support for discontinous mem and
> > from what I've seen so far I'm not really convinced yet to give it up.
> 
> I don't see anything wrong in continuing with this.  ARM also does
> this in addition to support for the discontig mem stuff.  Why?
> 
> The generial discontig code is ok so long as you have a lot of RAM
> in node 0.  However, since all allocations currently come from node
> 0, if this node is small, then there is a chance that you will run
> out of memory at bootup, and then not be able to continue (and
> because we both use fbcon, there is no message visible to the user,
> and hence no diagnostics).

Note: the biggest component of bootmem allocation is the mem_map for
that node, which happens on specific nodes. I agree, other allocations
happen out of node 0, but if there is a chance that on specific architectures
we might run out of memory on node 0, we can fix this, although I would
like to hear details offline ...

> 
> Continuing with the single node but many "areas" that ARM follows, and
> from what it sounds like m68k does, means that you can allocate from
> any "area", and therefore don't hit this restriction.
> 
> One way out of this would be if the NUMA stuff can have the "allocations
> only from node 0" feature turned off, and then I'd be happy to let the
> ARM version be replaced totally by the discontig case.

This is not NUMA, this is regular DISCONTIG. One option while doing 
alloc_bootmem (ie, no node specified), is to do the allocation from node 
0, since no other node can be guranteed to exist. 

If this sounds too constricting, we can modify alloc_bootmem to try 
allocating from all nodes for which alloc_bootmem_node() has already
been done. Shouldn't be too hard to implement and the changes are
completely in the bootmem allocator code. Lets talk offline (along
with Roman) if you are interested.

Kanoj

>    _____
>   |_____| ------------------------------------------------- ---+---+-
>   |   |         Russell King        rmk@arm.linux.org.uk      --- ---
>   | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
>   | +-+-+                                                     --- -+-
>   /   |               THE developer of ARM Linux              |+| /|\
>  /  | | |                                                     ---  |
>     +-+-+ -------------------------------------------------  /\\\  |
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-11 14:56     ` Roman Zippel
@ 2000-08-12  9:18       ` Bjorn Wesen
  0 siblings, 0 replies; 36+ messages in thread
From: Bjorn Wesen @ 2000-08-12  9:18 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-mm, linux-kernel

On Fri, 11 Aug 2000, Roman Zippel wrote:
> The problem here is that the relation between virtual address / physical
> address / page struct / memmap+index is hardly documented and it gets
> more interesting when a page struct might also represent an i/o area

Amen to that - I'm doing a 2.4 port currently and our architecture has all
DRAM at a pseudo-physical address 0xc0000000. Figuring out how to not make
mem_map start at 0 and waste a lot of struct page's to cover everything up
to 0xc0000000 and beyond, and what the __pa/__va things should do wrgds to
the pseudo-0xc0000000 took some hours of groping around the archs and
bootmem/zone code :) then it suddenly worked.. and like, "wow, don't touch
it again!" :) 

(luckily I found a comment in mm/numa.c about exactly that and that m68k
and arm used it - you could never have been led to believe that looking
through the non-commented source :) 

The relationships between virtual/logical/physical etc. can be extremely
confusing - our CPU has physical DRAM at 0x40000000 but it is segmented
into 0xc0000000 in kernel-mode, while the paged virtual memory is at 0.
Heh. Fortunately the 0x40000000 business can be largely ignored since it
is only visible inside the TLB - for all other purposes the DRAM is at
0xc... 

So what I ended up doing was to make __pa/__va convert between 0xc.. and
0x4.., put PAGE_OFFSET == 0xc.., max/min_low_pfn's at 0xc..., mem_map
indexes start at 0 (corresponding to 0xc).. seems to work so far :) 

It does not help of course that all archs do the bootmem and zone
initialization in their own ways :) 

-Bjorn


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-11  2:24 ` David S. Miller
@ 2000-08-14  0:34   ` Anton Blanchard
  0 siblings, 0 replies; 36+ messages in thread
From: Anton Blanchard @ 2000-08-14  0:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: kanoj, linux-mm, linux-kernel, rmk, nico, davidm, alan

 
> I took care of sparc64 and have asked Anton to deal with sparc32.

sparc32 has also been fixed.

Anton
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-11 17:21     ` Kanoj Sarcar
@ 2000-08-14  9:29       ` Roman Zippel
  0 siblings, 0 replies; 36+ messages in thread
From: Roman Zippel @ 2000-08-14  9:29 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: Russell King, linux-mm, linux-kernel

Hi,

> And even if it doesn't help m68k, it definitely will help mips64, ia64
> and ARM (from what I am understanding from Russell). So, unless it is
> _breaking_ m68k, I would rather see the patch go in ...

No, it doesn't :) and I think I can start thinking to make it usable
under m68k.

bye, Roman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-10 17:18 pte_pagenr/MAP_NR deleted in pre6 Kanoj Sarcar
  2000-08-11  2:24 ` David S. Miller
  2000-08-11 11:50 ` Roman Zippel
@ 2000-08-15 16:19 ` Stephen C. Tweedie
  2000-08-16  8:25   ` Roman Zippel
  2 siblings, 1 reply; 36+ messages in thread
From: Stephen C. Tweedie @ 2000-08-15 16:19 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: linux-mm, linux-kernel, rmk, nico, davem, davidm, alan

Hi,

On Thu, Aug 10, 2000 at 10:18:49AM -0700, Kanoj Sarcar wrote:
> Thought I would send out a quick note about a change I put into test6.
> Basically, to make it easier to implement DISCONTIGMEM systems, the
> concepts of page/mem_map number/index has been killed from the generic
> (non architecture specific) parts of the kernel.

Excellent, this will make it _tons_ easier for me to create new zones
of mem_map arrays on the fly to allow us to create struct pages for
PCI IO-aperture memory (necessary for kiobuf mappings of IO memory).

Thanks!

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-15 16:19 ` Stephen C. Tweedie
@ 2000-08-16  8:25   ` Roman Zippel
  2000-08-16 17:13     ` Kanoj Sarcar
  2000-08-16 18:17     ` Stephen C. Tweedie
  0 siblings, 2 replies; 36+ messages in thread
From: Roman Zippel @ 2000-08-16  8:25 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Kanoj Sarcar, linux-mm, linux-kernel, rmk, nico, davem, davidm, alan

Hi,

> Excellent, this will make it _tons_ easier for me to create new zones
> of mem_map arrays on the fly to allow us to create struct pages for
> PCI IO-aperture memory (necessary for kiobuf mappings of IO memory).

A related question: do you already have an idea how the driver interface
for that could look like? I mean, some drivers need a virtual address,
some need the physical address for dma and some of them might need
bounce buffers. E.g. I don't know how to get (quickly) from a page
struct which represents an io mapping to the physical address. Will we
add some generic funtions for this which can be used by drivers or even
let the drivers only specify its requirements and the buffer code will
generate an appropriate io request. I have a few ideas, but I don't know
if already concrete plans exists.

bye, Roman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-16  8:25   ` Roman Zippel
@ 2000-08-16 17:13     ` Kanoj Sarcar
  2000-08-16 18:20       ` Stephen C. Tweedie
  2000-08-16 18:17     ` Stephen C. Tweedie
  1 sibling, 1 reply; 36+ messages in thread
From: Kanoj Sarcar @ 2000-08-16 17:13 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Stephen C. Tweedie, linux-mm, linux-kernel, rmk, nico, davem,
	davidm, alan

> 
> Hi,
> 
> > Excellent, this will make it _tons_ easier for me to create new zones
> > of mem_map arrays on the fly to allow us to create struct pages for
> > PCI IO-aperture memory (necessary for kiobuf mappings of IO memory).
> 
> A related question: do you already have an idea how the driver interface
> for that could look like? I mean, some drivers need a virtual address,
> some need the physical address for dma and some of them might need
> bounce buffers. E.g. I don't know how to get (quickly) from a page
> struct which represents an io mapping to the physical address. Will we
> add some generic funtions for this which can be used by drivers or even
> let the drivers only specify its requirements and the buffer code will
> generate an appropriate io request. I have a few ideas, but I don't know
> if already concrete plans exists.
> 
> bye, Roman
> 

FWIW, Linus was mildly suggesting I implement page_to_phys, to complement
virt_to_page. I didn't see an immediate need for it, so I just did the
bit I am interested in for now. If you look, most of the mk_pte() definitions
should actually use page_to_phys ...

Of course, I am talking about struct pages that represent memory, not io
devices, I don't think either one of us was thinking about that ...

I also thought about whether page_to_phys would be useful for drivers,
decided against it, since the PCI-DMA apis which are quite a standard
now want to go to the PCI bus addresses, instead of physical addresses.

BTW, I am not sure I understand when you say "some drivers need a virtual 
address, some need the physical address for dma and some of them might need
bounce buffers". I believe, the goal should be to pass in either a. struct
page or b. physical address, then the driver makes the PCI-DMA calls to 
determine whether it can dma directly into the page (or transparently 
get a page which it can dma into, and the PCI-DMA layer handles the 
bouncing completely). Passing in virtual addresses into drivers is not
good, if you think about the i386 class machines which can not direct 
map the entire memory (hence would need kmap addresses for high pages).

Finally, whether the drivers accept virtual addresses or struct pages, 
they should not be trying to interpret their input, rather treat the input
as opaque cookies, to be passed on to the PCI-DMA layer ...

Kanoj
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-16  8:25   ` Roman Zippel
  2000-08-16 17:13     ` Kanoj Sarcar
@ 2000-08-16 18:17     ` Stephen C. Tweedie
  1 sibling, 0 replies; 36+ messages in thread
From: Stephen C. Tweedie @ 2000-08-16 18:17 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Stephen C. Tweedie, Kanoj Sarcar, linux-mm, linux-kernel, rmk,
	nico, davem, davidm, alan

Hi,

On Wed, Aug 16, 2000 at 10:25:08AM +0200, Roman Zippel wrote:
> 
> > Excellent, this will make it _tons_ easier for me to create new zones
> > of mem_map arrays on the fly to allow us to create struct pages for
> > PCI IO-aperture memory (necessary for kiobuf mappings of IO memory).
> 
> A related question: do you already have an idea how the driver interface
> for that could look like? I mean, some drivers need a virtual address,
> some need the physical address for dma and some of them might need
> bounce buffers.

It's even more complicated than that --- you can't even assume that
the pages concerned have got valid pointers in _any_ address space,
because they might be high memory pages on PAE36 which exist above the
4GB boundary and which aren't mapped into virtual memory anywhere.

We will need to make sure that there is a clean way to convert any
struct page * into (a) a kernel virtual address (that's easy, kmap()
does it already); (b) a physical address (which can be translated
easily into a bus address); or (c) a page frame number which can
identify pages above 4GB even though a ulong pointer/address can't
cope with such pages as addresses directly.  

However, the kiobuf code will not do anything fancy with _any_ of this
--- it will continue just to carry struct page *s.  It will be up to
the users of the kiobufs to do anything further with them.  I already
have bounce buffer support for kiobufs in 2.2 (as a quick hack to let
highmem raw IO work on 2.2; 2.4 is much cleaner and doesn't need that
particular hack).  I'll make sure that 2.4 has a clean way of doing
bounce buffers too, probably by means of a clone_kiobuf() function
which creates a new kiobuf by cloning the pages of the original if
they satisfy some constraint (such as <1GB, <4GB), and
pre/post-copying them if they do not.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-16 17:13     ` Kanoj Sarcar
@ 2000-08-16 18:20       ` Stephen C. Tweedie
  2000-08-16 18:24         ` David S. Miller
                           ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Stephen C. Tweedie @ 2000-08-16 18:20 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: Roman Zippel, Stephen C. Tweedie, linux-mm, linux-kernel, rmk,
	nico, davem, davidm, alan

Hi,

On Wed, Aug 16, 2000 at 10:13:21AM -0700, Kanoj Sarcar wrote:
> 
> FWIW, Linus was mildly suggesting I implement page_to_phys, to complement
> virt_to_page.

It's part of what is necessary if we want to push kiobufs into the
driver layers.  page_to_pfn is needed to for PAE36 support so that
PCI64 or dual-address-cycle drivers can handle physical addresses
longer than 32 bits long.

> BTW, I am not sure I understand when you say "some drivers need a virtual 
> address, some need the physical address for dma and some of them might need
> bounce buffers". I believe, the goal should be to pass in either a. struct
> page or b. physical address

Yes, but different drivers have different requirements on those struct
page *s.  Drivers which do programmed IO need to be able to turn the
page into a kernel virtual address.  Drivers which can access >32-bit
addresses need to turn the page into an index which fits inside 32
bits.  Drivers which do DMA but only to <4GB addresses need bounce
buffers.

That is irrelevant as far as the kiobuf data structure is concerned,
but it is very important for the internals of the drivers, so this
sort of functionality must be made available for drivers to use
internally as needed.

Cheers, 
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-16 18:20       ` Stephen C. Tweedie
@ 2000-08-16 18:24         ` David S. Miller
  2000-08-16 19:53           ` Stephen C. Tweedie
  2000-08-16 18:47         ` Kanoj Sarcar
  2000-08-16 22:22         ` Kanoj Sarcar
  2 siblings, 1 reply; 36+ messages in thread
From: David S. Miller @ 2000-08-16 18:24 UTC (permalink / raw)
  To: sct; +Cc: kanoj, roman, linux-mm, linux-kernel, rmk, nico, davidm, alan

   Drivers which do DMA but only to <4GB addresses need bounce
   buffers.

This is only true in an architecture specific sense (ie. x86 systems
are the one's which have this particular restriction).

Which is one of the reasons I wish the bounce buffer stuff went into
the place it belongs, behind the pci_dma API.  If we move to a
page+offset model for drivers, we could do exactly this and also
handle cases like ix86 PAE.

Later,
David S. Miller
davem@redhat.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-16 18:47         ` Kanoj Sarcar
@ 2000-08-16 18:39           ` David S. Miller
  2000-08-16 19:30             ` Stephen C. Tweedie
  0 siblings, 1 reply; 36+ messages in thread
From: David S. Miller @ 2000-08-16 18:39 UTC (permalink / raw)
  To: kanoj; +Cc: sct, roman

   I guess finally, drivers will either get one or a list of

   1. struct page or

Make this "struct page and offset", a page is not enough by itself to
indicate all the necessary information, you need an offset within the
page as well.

Later,
David S. Miller
davem@redhat.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-16 18:20       ` Stephen C. Tweedie
  2000-08-16 18:24         ` David S. Miller
@ 2000-08-16 18:47         ` Kanoj Sarcar
  2000-08-16 18:39           ` David S. Miller
  2000-08-16 22:22         ` Kanoj Sarcar
  2 siblings, 1 reply; 36+ messages in thread
From: Kanoj Sarcar @ 2000-08-16 18:47 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Roman Zippel, linux-mm, linux-kernel, rmk, nico, davem, davidm, alan

> 
> Hi,
> 
> On Wed, Aug 16, 2000 at 10:13:21AM -0700, Kanoj Sarcar wrote:
> > 
> > FWIW, Linus was mildly suggesting I implement page_to_phys, to complement
> > virt_to_page.
> 
> It's part of what is necessary if we want to push kiobufs into the
> driver layers.  page_to_pfn is needed to for PAE36 support so that
> PCI64 or dual-address-cycle drivers can handle physical addresses
> longer than 32 bits long.
> 
> > BTW, I am not sure I understand when you say "some drivers need a virtual 
> > address, some need the physical address for dma and some of them might need
> > bounce buffers". I believe, the goal should be to pass in either a. struct
> > page or b. physical address
> 
> Yes, but different drivers have different requirements on those struct
> page *s.  Drivers which do programmed IO need to be able to turn the
> page into a kernel virtual address.  Drivers which can access >32-bit
> addresses need to turn the page into an index which fits inside 32
> bits.  Drivers which do DMA but only to <4GB addresses need bounce
> buffers.
> 
> That is irrelevant as far as the kiobuf data structure is concerned,
> but it is very important for the internals of the drivers, so this
> sort of functionality must be made available for drivers to use
> internally as needed.
> 
> Cheers, 
>  Stephen
> 

It might be easier all around if we could all agree to what drivers
need to do. As David Miller points out, whether a driver can dma into
>32-bit addresses etc is also a function of the architecture, so this
is best hidden under per architecture PCI-DMA layer. So, if the driver 
writer codes according to this, he will transparently get the best 
performance for any architecure ...

I guess finally, drivers will either get one or a list of

1. struct page or
2. pfn or
3. paddr_t (unsigned long long on PAE36, unsigned long on other platforms)

The PCI-DMA layer should be able to handle this type of input. The driver
must not attempt to convert this to PCI bus addresses. The driver must call
an arcitecture hook, like kmap(), to get a kernel virtual address for
the underlying page. It should be able to do without needing the physical
address of the page, the PCI-DMA routines will know how to do that.

kiobufs might need to get some hooks into PCI-DMA, but shouldn't this
suffice, mostly? Or is this being too restrictive for some drivers?

Kanoj
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-16 18:39           ` David S. Miller
@ 2000-08-16 19:30             ` Stephen C. Tweedie
  0 siblings, 0 replies; 36+ messages in thread
From: Stephen C. Tweedie @ 2000-08-16 19:30 UTC (permalink / raw)
  To: David S. Miller
  Cc: kanoj, sct, roman, linux-mm, linux-kernel, rmk, nico, davidm, alan

Hi,

On Wed, Aug 16, 2000 at 11:39:17AM -0700, David S. Miller wrote:
> 
>    I guess finally, drivers will either get one or a list of
> 
>    1. struct page or
> 
> Make this "struct page and offset", a page is not enough by itself to
> indicate all the necessary information, you need an offset within the
> page as well.

That's exactly what a kiobuf is --- a vector of struct page *s, plus
an arbitrary offset and length.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-16 18:24         ` David S. Miller
@ 2000-08-16 19:53           ` Stephen C. Tweedie
  0 siblings, 0 replies; 36+ messages in thread
From: Stephen C. Tweedie @ 2000-08-16 19:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: sct, kanoj, roman, linux-mm, linux-kernel, rmk, nico, davidm, alan

Hi,

On Wed, Aug 16, 2000 at 11:24:23AM -0700, David S. Miller wrote:
> 
> Which is one of the reasons I wish the bounce buffer stuff went into
> the place it belongs, behind the pci_dma API.  If we move to a
> page+offset model for drivers, we could do exactly this and also
> handle cases like ix86 PAE.

Fine --- it is pretty easy to generate a scatterlist from a kiobuf so
that drivers using a kiovec API can use the existing pci_dma support.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-16 18:20       ` Stephen C. Tweedie
  2000-08-16 18:24         ` David S. Miller
  2000-08-16 18:47         ` Kanoj Sarcar
@ 2000-08-16 22:22         ` Kanoj Sarcar
  2000-08-17  9:11           ` Stephen C. Tweedie
  2 siblings, 1 reply; 36+ messages in thread
From: Kanoj Sarcar @ 2000-08-16 22:22 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Roman Zippel, linux-mm, linux-kernel, rmk, nico, davem, davidm, alan

> 
> Hi,
> 
> On Wed, Aug 16, 2000 at 10:13:21AM -0700, Kanoj Sarcar wrote:
> > 
> > FWIW, Linus was mildly suggesting I implement page_to_phys, to complement
> > virt_to_page.
> 
> It's part of what is necessary if we want to push kiobufs into the
> driver layers.  page_to_pfn is needed to for PAE36 support so that
> PCI64 or dual-address-cycle drivers can handle physical addresses
> longer than 32 bits long.
>

While we are on this topic, something like

#define page_to_phys(page) \
	((((page)-(page)->zone->zone_mem_map) << PAGE_SHIFT) \
	+ ((page)->zone->zone_start_paddr))

should work on all platforms on 2.4. (You might have to add in an
unsigned long long somewhere in there for PAE36).

Kanoj 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-16 22:22         ` Kanoj Sarcar
@ 2000-08-17  9:11           ` Stephen C. Tweedie
  2000-08-17 19:07             ` Kanoj Sarcar
  0 siblings, 1 reply; 36+ messages in thread
From: Stephen C. Tweedie @ 2000-08-17  9:11 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: Stephen C. Tweedie, Roman Zippel, linux-mm, linux-kernel, rmk,
	nico, davem, davidm, alan

Hi,

On Wed, Aug 16, 2000 at 03:22:07PM -0700, Kanoj Sarcar wrote:

> > It's part of what is necessary if we want to push kiobufs into the
> > driver layers.  page_to_pfn is needed to for PAE36 support so that
> > PCI64 or dual-address-cycle drivers can handle physical addresses
> > longer than 32 bits long.
> 
> While we are on this topic, something like
> 
> #define page_to_phys(page) \
> 	((((page)-(page)->zone->zone_mem_map) << PAGE_SHIFT) \
> 	+ ((page)->zone->zone_start_paddr))
> 
> should work on all platforms on 2.4. (You might have to add in an
> unsigned long long somewhere in there for PAE36).

The long long is exactly what we need to avoid: PAE36 still has
pointers as 32-bit values.  Only ptes get the 64-bit treatment.

Adding a BUG() test to detect illegal accesses to >4GB pages on PAE36
would be fine.  If we have the appropriate bounce buffer support in
place in pci_dma or wherever suits it, then by the time a driver is
doing page_to_phys() it should already have created the appropriate
bounce buffers and so the BUG() test is fine. 

For DAC/PCI64 drivers, though, we need a separate macro like
page_to_pfn so that we can identify the physical address via a 32-bit
value.  The driver can then shift that into a 64-bit long long if it
wants to --- there's no need to introduce new 64-bit macros into the
mm just for this special case.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17 19:07             ` Kanoj Sarcar
@ 2000-08-17 19:01               ` David S. Miller
  2000-08-17 19:19                 ` Alan Cox
  2000-08-17 19:32                 ` Kanoj Sarcar
  0 siblings, 2 replies; 36+ messages in thread
From: David S. Miller @ 2000-08-17 19:01 UTC (permalink / raw)
  To: kanoj; +Cc: sct

   Whatever you do, you either have to introduce paddr_t (which to me
   seems more intuitive) or page_to_pfn. We can argue one way or
   another, but paddr_t might give you type checking for free too ...

My only two gripes about paddr_t is that long long is not only
expensive but has been also known to be buggy on 32-bit platforms.

The next gripe is that it will make many clueless driver
etc. developers (who don't read documentation even, but write a large
portion of the vendor Linux drivers :-) will try to do things
like "void *p = (void *) (PAGE_OFFSET + x->paddr);" and expect
this to work, or maybe they'll even pass it to virt_to_bus or similar.

If people don't think these two things will be an issue, fine with
me. :-)

Which reminds me, we need to schedule a field day early 2.5.x where
virt_to_bus and bus_to_virt are exterminated, this is the only way we
can move to drivers using page+offset correctly, forcing them through
interface such as the pci_dma API instead.

Later,
David S. Miller
davem@redhat.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17  9:11           ` Stephen C. Tweedie
@ 2000-08-17 19:07             ` Kanoj Sarcar
  2000-08-17 19:01               ` David S. Miller
  0 siblings, 1 reply; 36+ messages in thread
From: Kanoj Sarcar @ 2000-08-17 19:07 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Roman Zippel, linux-mm, linux-kernel, rmk, nico, davem, davidm, alan

> 
> Hi,
> 
> On Wed, Aug 16, 2000 at 03:22:07PM -0700, Kanoj Sarcar wrote:
> 
> > > It's part of what is necessary if we want to push kiobufs into the
> > > driver layers.  page_to_pfn is needed to for PAE36 support so that
> > > PCI64 or dual-address-cycle drivers can handle physical addresses
> > > longer than 32 bits long.
> > 
> > While we are on this topic, something like
> > 
> > #define page_to_phys(page) \
> > 	((((page)-(page)->zone->zone_mem_map) << PAGE_SHIFT) \
> > 	+ ((page)->zone->zone_start_paddr))
> > 
> > should work on all platforms on 2.4. (You might have to add in an
> > unsigned long long somewhere in there for PAE36).
> 
> The long long is exactly what we need to avoid: PAE36 still has
> pointers as 32-bit values.  Only ptes get the 64-bit treatment.
> 
> Adding a BUG() test to detect illegal accesses to >4GB pages on PAE36
> would be fine.  If we have the appropriate bounce buffer support in
> place in pci_dma or wherever suits it, then by the time a driver is
> doing page_to_phys() it should already have created the appropriate
> bounce buffers and so the BUG() test is fine. 
> 
> For DAC/PCI64 drivers, though, we need a separate macro like
> page_to_pfn so that we can identify the physical address via a 32-bit

Or, use a 64 bit value to represent physical addresses. Which is why
I am proposing paddr_t. In that case,

#define page_to_phys(page)
	((((unsigned long long)((page)-(page)->zone->zone_mem_map)) \
	<< PAGE_SHIFT) + ((page)->zone->zone_start_paddr))

This would "work" (on i386) despite the fact that zone->zone_start_paddr is
"unsigned long" not "unsigned long long". 

Things would be much easier with paddr_t.

#define page_to_phys(page)
	((((paddr_t)((page)-(page)->zone->zone_mem_map)) \
	<< PAGE_SHIFT) + ((page)->zone->zone_start_paddr))

and we would change zone->zone_start_paddr to be paddr_t too.

> value.  The driver can then shift that into a 64-bit long long if it
> wants to --- there's no need to introduce new 64-bit macros into the
> mm just for this special case.

Whatever you do, you either have to introduce paddr_t (which to me seems
more intuitive) or page_to_pfn. We can argue one way or another, but 
paddr_t might give you type checking for free too ...

Kanoj

> 
> Cheers,
>  Stephen
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17 19:01               ` David S. Miller
@ 2000-08-17 19:19                 ` Alan Cox
  2000-08-17 19:20                   ` David S. Miller
  2000-08-17 19:24                   ` Alan Cox
  2000-08-17 19:32                 ` Kanoj Sarcar
  1 sibling, 2 replies; 36+ messages in thread
From: Alan Cox @ 2000-08-17 19:19 UTC (permalink / raw)
  To: David S. Miller
  Cc: kanoj, sct, roman, linux-mm, linux-kernel, rmk, nico, davidm, alan

>    Whatever you do, you either have to introduce paddr_t (which to me
>    seems more intuitive) or page_to_pfn. We can argue one way or
>    another, but paddr_t might give you type checking for free too ...
> 
> My only two gripes about paddr_t is that long long is not only
> expensive but has been also known to be buggy on 32-bit platforms.

Except for the x86 36bit abortion do we need a long long paddr_t on any
32bit platform ?

> Which reminds me, we need to schedule a field day early 2.5.x where
> virt_to_bus and bus_to_virt are exterminated, this is the only way we
> can move to drivers using page+offset correctly, forcing them through
> interface such as the pci_dma API instead.

So you'll be adding an isa_alloc_consistant, mca_alloc_consistent, 
m68k_motherboard_alloc_consistent , ....

And then of course I need virt_to_bus/bus_to_virt to poke at things like
hardware on a PC and to access the roms.

Its not trivial to exterminate. It really isnt. The PCI api is a tiny subset
of uses for those functions.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17 19:19                 ` Alan Cox
@ 2000-08-17 19:20                   ` David S. Miller
  2000-08-17 19:33                     ` Alan Cox
                                       ` (2 more replies)
  2000-08-17 19:24                   ` Alan Cox
  1 sibling, 3 replies; 36+ messages in thread
From: David S. Miller @ 2000-08-17 19:20 UTC (permalink / raw)
  To: alan; +Cc: kanoj, sct, roman, linux-mm, linux-kernel, rmk, nico, davidm

   > My only two gripes about paddr_t is that long long is not only
   > expensive but has been also known to be buggy on 32-bit platforms.

   Except for the x86 36bit abortion do we need a long long paddr_t on any
   32bit platform ?

Sparc32, mips32...

   > Which reminds me, we need to schedule a field day early 2.5.x where
   > virt_to_bus and bus_to_virt are exterminated, this is the only way we
   > can move to drivers using page+offset correctly, forcing them through
   > interface such as the pci_dma API instead.

   So you'll be adding an isa_alloc_consistant, mca_alloc_consistent, 
   m68k_motherboard_alloc_consistent , ....

I'll probably be adding isa_virt_to_bus, because when it is in fact
"ISA like" the driver already knows that it must be certain that the
physical address is below the 16MB mark right?  Then the cases left on
x86 are MCA (which can use the ISA interface) and PCI drivers which
must be updated to use the PCI dma API.

Just like I did for SBUS, the m68k folks can deal with their issues
any way they like.

   Its not trivial to exterminate.

I think it is.  What's not trivial is getting bozos to clean up their
drivers.  For example, BTTV still doesn't use the PCI dma stuff simply
because nobody wishes to use their brains a little bit and encapsulate
the user DMA stuff into a common spot (it's duplicated in 4 or 5
drivers) which uses scatter gather lists with the DMA api.

Later,
David S. Miller
davem@redhat.com


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17 19:19                 ` Alan Cox
  2000-08-17 19:20                   ` David S. Miller
@ 2000-08-17 19:24                   ` Alan Cox
  1 sibling, 0 replies; 36+ messages in thread
From: Alan Cox @ 2000-08-17 19:24 UTC (permalink / raw)
  To: Alan Cox
  Cc: David S. Miller, kanoj, sct, roman, linux-mm, linux-kernel, rmk,
	nico, davidm

> > can move to drivers using page+offset correctly, forcing them through
> > interface such as the pci_dma API instead.
> 
> So you'll be adding an isa_alloc_consistant, mca_alloc_consistent, 
> m68k_motherboard_alloc_consistent , ....
> 
> And then of course I need virt_to_bus/bus_to_virt to poke at things like
> hardware on a PC and to access the roms.

Umm wait - for those its hidden inside the ioremap so private.. so its just
the mca/zorro/whateverbus ones

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17 19:32                 ` Kanoj Sarcar
@ 2000-08-17 19:30                   ` David S. Miller
  2000-08-17 20:00                     ` Kanoj Sarcar
  0 siblings, 1 reply; 36+ messages in thread
From: David S. Miller @ 2000-08-17 19:30 UTC (permalink / raw)
  To: kanoj; +Cc: sct, roman, linux-mm, linux-kernel, rmk, nico, davidm, alan

BTW, I've sed s/vger.rutgers.edu/vger.redhat.com/

   Wait! You are saying you have a scheme that will prevent writers 
   from writing buggy code that happens to work only on 32Mb i386 ...
   Go ahead, I am all ears :-)

I understand your point, but please understand mine.

One might laugh, but after I read and really considered some of the
points made by the author of "Writing Solid Code" in that book, I
realized that one of my jobs as someone creating an API is that I
should be trying as hard as possible to design it such that it is next
to impossible to misuse it.

Secondly, I learned that I shouldn't be adding API's spuriously
because it will end up being maintained forever, re: the
kern_addr_looks_ok sillyness :-)

So anyways, I was probably being overly anal for this particular case.

Later,
David S. Miller
davem@redhat.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17 19:01               ` David S. Miller
  2000-08-17 19:19                 ` Alan Cox
@ 2000-08-17 19:32                 ` Kanoj Sarcar
  2000-08-17 19:30                   ` David S. Miller
  1 sibling, 1 reply; 36+ messages in thread
From: Kanoj Sarcar @ 2000-08-17 19:32 UTC (permalink / raw)
  To: David S. Miller
  Cc: sct, roman, linux-mm, linux-kernel, rmk, nico, davidm, alan

> 
>    Whatever you do, you either have to introduce paddr_t (which to me
>    seems more intuitive) or page_to_pfn. We can argue one way or
>    another, but paddr_t might give you type checking for free too ...
> 
> My only two gripes about paddr_t is that long long is not only
> expensive but has been also known to be buggy on 32-bit platforms.

Yeah, yeah, I know ... (I didn't know about the buggy bit though).
OTOH, paddr_t is such an intuitive concept (and without any disadvantages
on any platform other than i386-PAE), its unfortunate if it
gets shot down just because of this ...

> 
> The next gripe is that it will make many clueless driver
> etc. developers (who don't read documentation even, but write a large
> portion of the vendor Linux drivers :-) will try to do things
> like "void *p = (void *) (PAGE_OFFSET + x->paddr);" and expect
> this to work, or maybe they'll even pass it to virt_to_bus or similar.

Wait! You are saying you have a scheme that will prevent writers 
from writing buggy code that happens to work only on 32Mb i386 ...
Go ahead, I am all ears :-)

Basically, with all the pci-dma and sct's alternate page_to_pfn
suggestion interfaces, you still can not claim that people will 
not do 

	void *p = (void *) (PAGE_OFFSET + page_to_pfn(p) << PAGE_SHIFT)

and check that code in because it works on their 1Gb i386 box. No?

Kanoj
 
> If people don't think these two things will be an issue, fine with
> me. :-)
> 
> Which reminds me, we need to schedule a field day early 2.5.x where
> virt_to_bus and bus_to_virt are exterminated, this is the only way we
> can move to drivers using page+offset correctly, forcing them through
> interface such as the pci_dma API instead.
> 
> Later,
> David S. Miller
> davem@redhat.com
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17 19:20                   ` David S. Miller
@ 2000-08-17 19:33                     ` Alan Cox
  2000-08-17 19:36                     ` Kanoj Sarcar
  2000-08-17 19:50                     ` Kanoj Sarcar
  2 siblings, 0 replies; 36+ messages in thread
From: Alan Cox @ 2000-08-17 19:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: alan, kanoj, sct, roman, linux-mm, linux-kernel, rmk, nico, davidm

> I'll probably be adding isa_virt_to_bus, because when it is in fact
> "ISA like" the driver already knows that it must be certain that the

isa_alloc_consistent makes sense actually. Its needed for ISA bus masters
on ancient mips and other crap

> physical address is below the 16MB mark right?  Then the cases left on

16Mb for ISA - except on a few late 486 era boxes with magic extensions (which
we'd finalyl be able to use)

> x86 are MCA (which can use the ISA interface) and PCI drivers which

no MCA bus is 32bit - its closer to PCI than ISA. mca_alloc_consistent is 
doable and if some loon ever does do old IBM power boxes it will be needed
as they apparently arent cache coherent MCA

> drivers.  For example, BTTV still doesn't use the PCI dma stuff simply
> because nobody wishes to use their brains a little bit and encapsulate
> the user DMA stuff into a common spot (it's duplicated in 4 or 5
> drivers) which uses scatter gather lists with the DMA api.

BTTV doesnt use it because the current stuff works and for post 2.4 using
mmap_kiovec() and similar stuff probably will be a better solution - that
will also help us to push PCI bug awareness into pci not drivers

Also mmap_kiovec will let i810 and some other sound cards do scatter gather
buffers sensibly.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17 19:20                   ` David S. Miller
  2000-08-17 19:33                     ` Alan Cox
@ 2000-08-17 19:36                     ` Kanoj Sarcar
  2000-09-07 14:31                       ` Ralf Baechle
  2000-08-17 19:50                     ` Kanoj Sarcar
  2 siblings, 1 reply; 36+ messages in thread
From: Kanoj Sarcar @ 2000-08-17 19:36 UTC (permalink / raw)
  To: David S. Miller
  Cc: alan, sct, roman, linux-mm, linux-kernel, rmk, nico, davidm

> 
>    Date: Thu, 17 Aug 2000 20:19:59 +0100 (BST)
>    From: Alan Cox <alan@lxorguk.ukuu.org.uk>
> 
>    > My only two gripes about paddr_t is that long long is not only
>    > expensive but has been also known to be buggy on 32-bit platforms.
> 
>    Except for the x86 36bit abortion do we need a long long paddr_t on any
>    32bit platform ?
> 
> Sparc32, mips32...
>

Not for Indys on mips32. Is there a mips32 port on another machine
(currently in Linux, or port ongoing) that requires this?

Kanoj
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17 19:50                     ` Kanoj Sarcar
@ 2000-08-17 19:41                       ` David S. Miller
  2000-09-07 14:26                         ` Ralf Baechle
  2000-08-17 19:56                       ` Alan Cox
  1 sibling, 1 reply; 36+ messages in thread
From: David S. Miller @ 2000-08-17 19:41 UTC (permalink / raw)
  To: kanoj; +Cc: alan, sct, roman, linux-mm, linux-kernel, rmk, nico, davidm

   So, unlike system vendors adding in dma mapping registers for PCI32
   devices to dma anywhere into their >32 bit physical address space, you 
   are assuming no vendor will ever have a mapping scheme for ISA devices
   that let them get over the 16MB mark? 

ISA is a dead hardware technology and therefore how it works is pretty
much fixed in stone.

Perhaps some older MIPS machines supporting ISA could benefit from
an API similar to the PCI dma stuff, as Alan mentioned.  But that is
the only case which has any merit in my mind.

Later,
David S. Miller
davem@redhat.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17 19:20                   ` David S. Miller
  2000-08-17 19:33                     ` Alan Cox
  2000-08-17 19:36                     ` Kanoj Sarcar
@ 2000-08-17 19:50                     ` Kanoj Sarcar
  2000-08-17 19:41                       ` David S. Miller
  2000-08-17 19:56                       ` Alan Cox
  2 siblings, 2 replies; 36+ messages in thread
From: Kanoj Sarcar @ 2000-08-17 19:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: alan, sct, roman, linux-mm, linux-kernel, rmk, nico, davidm

> 
>    So you'll be adding an isa_alloc_consistant, mca_alloc_consistent, 
>    m68k_motherboard_alloc_consistent , ....
> 
> I'll probably be adding isa_virt_to_bus, because when it is in fact
> "ISA like" the driver already knows that it must be certain that the
> physical address is below the 16MB mark right?  Then the cases left on
> x86 are MCA (which can use the ISA interface) and PCI drivers which
> must be updated to use the PCI dma API.
>

Just a minor nit. 

So, unlike system vendors adding in dma mapping registers for PCI32
devices to dma anywhere into their >32 bit physical address space, you 
are assuming no vendor will ever have a mapping scheme for ISA devices
that let them get over the 16MB mark? 

Of course, I am not aware of ISA that much anyway (and I hope I don't
have to!), so please ignore this if it doesn't make sense.

Kanoj
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17 19:50                     ` Kanoj Sarcar
  2000-08-17 19:41                       ` David S. Miller
@ 2000-08-17 19:56                       ` Alan Cox
  1 sibling, 0 replies; 36+ messages in thread
From: Alan Cox @ 2000-08-17 19:56 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: David S. Miller, alan, sct, roman, linux-mm, linux-kernel, rmk,
	nico, davidm

> So, unlike system vendors adding in dma mapping registers for PCI32
> devices to dma anywhere into their >32 bit physical address space, you 
> are assuming no vendor will ever have a mapping scheme for ISA devices
> that let them get over the 16MB mark? 

They did. Even on a few x86 boards. Supporting those bits of weirdness are
not important. If the ISA 16Mb window is offset someone can wrap it in their
arch specific isa_alloc_consistent code..



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17 19:30                   ` David S. Miller
@ 2000-08-17 20:00                     ` Kanoj Sarcar
  0 siblings, 0 replies; 36+ messages in thread
From: Kanoj Sarcar @ 2000-08-17 20:00 UTC (permalink / raw)
  To: David S. Miller
  Cc: sct, roman, linux-mm, linux-kernel, rmk, nico, davidm, alan

> 
>    From: Kanoj Sarcar <kanoj@google.engr.sgi.com>
>    Date: Thu, 17 Aug 2000 12:32:35 -0700 (PDT)
> 
> BTW, I've sed s/vger.rutgers.edu/vger.redhat.com/
> 
>    Wait! You are saying you have a scheme that will prevent writers 
>    from writing buggy code that happens to work only on 32Mb i386 ...
>    Go ahead, I am all ears :-)
> 
> I understand your point, but please understand mine.
> 
> One might laugh, but after I read and really considered some of the
> points made by the author of "Writing Solid Code" in that book, I
> realized that one of my jobs as someone creating an API is that I
> should be trying as hard as possible to design it such that it is next
> to impossible to misuse it.

Unfortunately, where there's a will to misuse, there usually is a way :-(

And that's doubly hard to accept after all the hard work that gets into
creating the API ...

Kanoj
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17 19:41                       ` David S. Miller
@ 2000-09-07 14:26                         ` Ralf Baechle
  0 siblings, 0 replies; 36+ messages in thread
From: Ralf Baechle @ 2000-09-07 14:26 UTC (permalink / raw)
  To: David S. Miller
  Cc: kanoj, alan, sct, roman, linux-mm, linux-kernel, rmk, nico, davidm

On Thu, Aug 17, 2000 at 12:41:52PM -0700, David S. Miller wrote:

> ISA is a dead hardware technology and therefore how it works is pretty
> much fixed in stone.
> 
> Perhaps some older MIPS machines supporting ISA could benefit from
> an API similar to the PCI dma stuff, as Alan mentioned.  But that is
> the only case which has any merit in my mind.

ISA isn't really a consideration for MIPS.  All that ISA hardware
couldn't be supported by treating it the same as on a x86 system.  That's
not top efficient but justified given the importance of ISA for MIPS
boxes - nearly NIL.

  Ralf
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: pte_pagenr/MAP_NR deleted in pre6
  2000-08-17 19:36                     ` Kanoj Sarcar
@ 2000-09-07 14:31                       ` Ralf Baechle
  0 siblings, 0 replies; 36+ messages in thread
From: Ralf Baechle @ 2000-09-07 14:31 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: David S. Miller, alan, sct, roman, linux-mm, linux-kernel, rmk,
	nico, davidm

On Thu, Aug 17, 2000 at 12:36:51PM -0700, Kanoj Sarcar wrote:

> >    Except for the x86 36bit abortion do we need a long long paddr_t on any
> >    32bit platform ?
> > 
> > Sparc32, mips32...
> >
> 
> Not for Indys on mips32. Is there a mips32 port on another machine
> (currently in Linux, or port ongoing) that requires this?

No.  Right now mips32 assumes that all memory is accessible in KSEG0 which
limits it to 512mb - $\epsilon$.  I don't know of any 32-bit CPU
configuration which supports memory than that and for 64-bit processors
the policy should be to use mips64 - it's so much saner.

  Ralf
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2000-09-07 14:31 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-08-10 17:18 pte_pagenr/MAP_NR deleted in pre6 Kanoj Sarcar
2000-08-11  2:24 ` David S. Miller
2000-08-14  0:34   ` Anton Blanchard
2000-08-11 11:50 ` Roman Zippel
2000-08-11 13:20   ` Russell King
2000-08-11 14:56     ` Roman Zippel
2000-08-12  9:18       ` Bjorn Wesen
2000-08-11 17:21     ` Kanoj Sarcar
2000-08-14  9:29       ` Roman Zippel
2000-08-15 16:19 ` Stephen C. Tweedie
2000-08-16  8:25   ` Roman Zippel
2000-08-16 17:13     ` Kanoj Sarcar
2000-08-16 18:20       ` Stephen C. Tweedie
2000-08-16 18:24         ` David S. Miller
2000-08-16 19:53           ` Stephen C. Tweedie
2000-08-16 18:47         ` Kanoj Sarcar
2000-08-16 18:39           ` David S. Miller
2000-08-16 19:30             ` Stephen C. Tweedie
2000-08-16 22:22         ` Kanoj Sarcar
2000-08-17  9:11           ` Stephen C. Tweedie
2000-08-17 19:07             ` Kanoj Sarcar
2000-08-17 19:01               ` David S. Miller
2000-08-17 19:19                 ` Alan Cox
2000-08-17 19:20                   ` David S. Miller
2000-08-17 19:33                     ` Alan Cox
2000-08-17 19:36                     ` Kanoj Sarcar
2000-09-07 14:31                       ` Ralf Baechle
2000-08-17 19:50                     ` Kanoj Sarcar
2000-08-17 19:41                       ` David S. Miller
2000-09-07 14:26                         ` Ralf Baechle
2000-08-17 19:56                       ` Alan Cox
2000-08-17 19:24                   ` Alan Cox
2000-08-17 19:32                 ` Kanoj Sarcar
2000-08-17 19:30                   ` David S. Miller
2000-08-17 20:00                     ` Kanoj Sarcar
2000-08-16 18:17     ` Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox