[RFC] On paging of kernel VM.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC] On paging of kernel VM.
@ 2002-09-09  9:20 David Woodhouse
  2002-09-09 11:13 ` Stephen C. Tweedie
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: David Woodhouse @ 2002-09-09  9:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm

I think I'd like to introduce 'real' VMAs into kernel space, so that areas
in the vmalloc range can have 'real' vm_ops and more to the point a real
nopage function.

Unfortunately AFAICT this would involve changing the fault handler on every 
platform, so I'm debating whether it's really worth it -- if anyone else 
could use it and if I could get round my problem any other way.

The problem is flash chips. These basically behave as ROM, but you write to 
them by writing magic values to magic addresses, and during a write 
operation the _whole_ chip returns status bits instead of data.

To avoid taking up precious RAM with copies of data which are already in 
flash, we can map pages of flash directly into userspace. On taking a 
fault, we wait for any pending write to complete, mark the chip as busy, 
then set up the page tables appropriately so that userspace can read from 
it. On starting a write operation, you invalidate all currently-visible 
pages before starting to talk to the chip.

There are cases in the kernel where we'd really like the same setup --
mounting a JFFS2 file system, for example, is a slow operation because it's
entirely log-structured and we have to read every log entry on the file
system. The current method of reading into a RAM buffer under a lock and
then dealing with stuff in RAM is entirely suboptimal, and proof-of-concept
hacks to just use a pointer into the flash chip have been observed to
improve mount time by about a factor of 4.

The locking is a problem though. Flash chips may be divided into multiple
partitions and other code may want to write to its partition while a mount
is in progress. The naive approach of just locking the chip into read mode
on giving out a pointer to it, and unlocking it when the mount is complete,
is going to suck royally. Hence, it would be very nice if we could play the
same trick as we do for userspace; giving out a pointer which is always
going to be valid; you just might have to wait for it. 

But as I said, this means screwing with every fault handler. It doesn't 
have to affect the fast path -- we can go looking for these vmas only in 
the case where we've already tried looking for the appropriate pte in 
init_mm and haven't found it. But it's still an intrusive change that would 
need to be done on every architecture.

I'm wondering what else could use this if it were implemented. Is there any
need for something like vmalloc_pageable(), for example? Anything else?
Rusty and I have wittered about marking certain kernel functions and data as
__pageable to go into a special such section too, but I'm wondering if that
conversation was slightly Guinness-influenced :)

Or is there another way to solve my original problem that I've overlooked?

Answers on a postcard to...

--
dwmw2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] On paging of kernel VM.
  2002-09-09  9:20 [RFC] On paging of kernel VM David Woodhouse
@ 2002-09-09 11:13 ` Stephen C. Tweedie
  2002-09-09 11:24 ` David Woodhouse
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Stephen C. Tweedie @ 2002-09-09 11:13 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-kernel, linux-mm, Stephen Tweedie

Hi,

On Mon, Sep 09, 2002 at 10:20:53AM +0100, David Woodhouse wrote:
> I think I'd like to introduce 'real' VMAs into kernel space, so that areas
> in the vmalloc range can have 'real' vm_ops and more to the point a real
> nopage function.

The alternative is a kmap-style mechanism for temporarily mapping
pages beyond physical memory on demand.  That would avoid the space
limits we have on vmalloc etc; there's only a few tens of MB of
address space we can use for mmap tricks in kernel space, so
persistent maps are seriously constrained if you've got a lot of flash
you want to map.

And with a kmap interface, your locking problems are much simpler ---
you can trap accesses at source and you don't have to go hunting ptes
to invalidate.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] On paging of kernel VM.
  2002-09-09  9:20 [RFC] On paging of kernel VM David Woodhouse
  2002-09-09 11:13 ` Stephen C. Tweedie
@ 2002-09-09 11:24 ` David Woodhouse
  2002-09-10  0:28 ` Daniel Phillips
  2002-09-10  6:08 ` David Woodhouse
  3 siblings, 0 replies; 5+ messages in thread
From: David Woodhouse @ 2002-09-09 11:24 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-kernel, linux-mm

sct@redhat.com said:
>  The alternative is a kmap-style mechanism for temporarily mapping
> pages beyond physical memory on demand.

That's a possibility I'd considered, but in this case there are problems
with explicitly mapping and unmapping the pages. The locking of the chip is
a detail I was hoping to avoid exposing to the users of the device. 

With mapping/unmapping done explicitly, not only does an active mapping
prevent all other users from writing to the same device, hence requiring a
'cond_temporarily_unmap()' kind of function, but you also get deadlock if a
user of the device tries to write while they have a mapping active. The
answer "don't do that then" is workable but not preferable. 

Given that all the logic to mark pages present on read and then invalidate
them on write access is going to have to be there for userspace _anyway_,
being able to keep the API nice and simple by using that in kernelspace too
would be far better, if we can justify the change to the slow path of the 
vmalloc fault case.

But yes, what you suggest is the current API for the flash stuff, sans the 
'cond_temporarily_unmap_if_people_are_waiting()' bit. And that's why I've 
avoided actually _using_ it, preferring to put up with the overhead of 
reading into a RAM buffer until we can fix it.

--
dwmw2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] On paging of kernel VM.
  2002-09-09  9:20 [RFC] On paging of kernel VM David Woodhouse
  2002-09-09 11:13 ` Stephen C. Tweedie
  2002-09-09 11:24 ` David Woodhouse
@ 2002-09-10  0:28 ` Daniel Phillips
  2002-09-10  6:08 ` David Woodhouse
  3 siblings, 0 replies; 5+ messages in thread
From: Daniel Phillips @ 2002-09-10  0:28 UTC (permalink / raw)
  To: David Woodhouse, linux-kernel; +Cc: linux-mm

On Monday 09 September 2002 11:20, David Woodhouse wrote:
> But as I said, this means screwing with every fault handler. It doesn't 
> have to affect the fast path -- we can go looking for these vmas only in 
> the case where we've already tried looking for the appropriate pte in 
> init_mm and haven't found it. But it's still an intrusive change that would 
> need to be done on every architecture.

Why can't you go per-architecture and fall back to the slow way of doing it
for architectures that don't have the new functionality yet?

-- 
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] On paging of kernel VM.
  2002-09-09  9:20 [RFC] On paging of kernel VM David Woodhouse
                   ` (2 preceding siblings ...)
  2002-09-10  0:28 ` Daniel Phillips
@ 2002-09-10  6:08 ` David Woodhouse
  3 siblings, 0 replies; 5+ messages in thread
From: David Woodhouse @ 2002-09-10  6:08 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel, linux-mm

phillips@arcor.de said:
>  Why can't you go per-architecture and fall back to the slow way of
> doing it for architectures that don't have the new functionality yet? 

No. We can't make this kind of change to the way the vmalloc region works on
some architectures only. It has to remain uniform.

Either it's worth doing for all, or it's not. It's a fairly trivial change
in the slow path, after all. I suspect it's worth it -- I'll ask the same 
question again with a patch attached as soon as I get time, in order to 
elicit more responses.

--
dwmw2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-09-10  6:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-09  9:20 [RFC] On paging of kernel VM David Woodhouse
2002-09-09 11:13 ` Stephen C. Tweedie
2002-09-09 11:24 ` David Woodhouse
2002-09-10  0:28 ` Daniel Phillips
2002-09-10  6:08 ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox