[PATCH] dirty pages in memory & co.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] dirty pages in memory & co.
@ 1999-05-07 14:56 Eric W. Biederman
  1999-05-10  0:57 ` Andrea Arcangeli
  1999-05-10 19:37 ` Stephen C. Tweedie
  0 siblings, 2 replies; 9+ messages in thread
From: Eric W. Biederman @ 1999-05-07 14:56 UTC (permalink / raw)
  To: linux-mm

O.k.  I've dug through all of my obvious bugs and I a working set of
kernel patches.
Currently they are against 2.2.5.
Location:
http://ebiederm/files/patches9.tar.gz

I consider this set of patches alpha (as I haven't had a chance to
stress test it yet).  But if you want to see the direction I'm going
it's a good thing to look at.

Documentaion, porting shmfs,  and stress testing are still to come.

The patches included are:
eb1 -- Allow reuse of page->buffers if you aren't the buffer cache
eb2 -- Allow old old a.out binaries to run even if we can't mmap them
       properly because their data isn't page aligned.
eb3 -- Much with page offset.
eb4 -- Allow registration and unregistration for functions needed by
       swap off.
eb5 -- Large file support, basically this removes unused bits from all
       of the relevant interfaces.
eb6 -- Introduction of struct vm_store, and associated cleanups.
       In particular get_inode_page.
eb7 -- Actuall patch for dirty buffers in the page cache.
       I'm fairly well satisfied except for generic_file_write.
       It looks like I need 2 variations on generic_file_write at the
       moment. 
       1) for network filesystems that can get away without filling
          the page on a partial write.
       2) for block based filesystems that must fill the page on a
          partial write because they can't write arbitrary chunks of
          data.

TODO:
1) document the new interfaces
2) porting shmfs
3) stress testing.
4) Experimenting with heuristics so that programs that are writing
   data faster than the disk can handle are put to sleep (think
   floppies).
5) Playing with mapped memory, and removing the need for kpiod.
   This will either require either reverse page tables, or 
   something equally effecting at finding page mappings from a struct page.
6) Removing the need for struct vm_operations in the vm_area_struct.
   A struct vm_store can probably handle everything.
7) Removing the swap lock map, by modify ipc/shm to use the page cache
   and vm_stores.

I'm going to visit my parents this weekend so I don't expect to get
much farther for a while. 

Eric

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dirty pages in memory & co.
  1999-05-07 14:56 [PATCH] dirty pages in memory & co Eric W. Biederman
@ 1999-05-10  0:57 ` Andrea Arcangeli
  1999-05-11  1:06   ` Eric W. Biederman
  1999-05-10 19:37 ` Stephen C. Tweedie
  1 sibling, 1 reply; 9+ messages in thread
From: Andrea Arcangeli @ 1999-05-10  0:57 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-mm

On 7 May 1999, Eric W. Biederman wrote:

>7) Removing the swap lock map, by modify ipc/shm to use the page cache
>   and vm_stores.

I just killed the swap lock map and I just use the swap cache for ipc shm
memory.

Now I was thinking at the reverse lookup from pagemap to pagetable that
you mentioned. It would be easy to that at least for the page/swap cache
mappings with the interface I added in my tree.

But to support dynamic relocation/defrag of memory on the whole VM we
should do that for _all_ pages. And to do the relocation we should run
with the GFP pages mapped in a separate pte (not in the 4mbyte page table
with the kernel). So I don't know if it would be better to just move all
kernel memory (the one available through GFP) to virtual memory and to
support the reverse lookup for all pages in the system, or if I should
only do the quite-easy backdoor for the page/swap cache. The point is that
supporting the reverse lookup for all kernel memory and having all kernel
memory in virtual memory, will be a _major_ performance hit for all
operations according to me.

Right now i would need the reverse lookup only for the mapped cache
because I would like to avoid to run swap_out to know if the pte is been
accessed or not and in the case it's an old pte I could unmap the
mmapped-page directly from shrink_mmap. But I am not convinced this will
be an improvement too because I just run swap_out at the right time...

Comments?

Andrea Arcangeli

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dirty pages in memory & co.
  1999-05-07 14:56 [PATCH] dirty pages in memory & co Eric W. Biederman
  1999-05-10  0:57 ` Andrea Arcangeli
@ 1999-05-10 19:37 ` Stephen C. Tweedie
  1999-05-10 21:01   ` Benjamin C.R. LaHaise
  1999-05-11  0:30   ` Eric W. Biederman
  1 sibling, 2 replies; 9+ messages in thread
From: Stephen C. Tweedie @ 1999-05-10 19:37 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-mm, Stephen Tweedie

Hi,

On 07 May 1999 09:56:00 -0500, ebiederm+eric@ccr.net (Eric W. Biederman)
said:

>        It looks like I need 2 variations on generic_file_write at the
>        moment. 
>        1) for network filesystems that can get away without filling
>           the page on a partial write.
>        2) for block based filesystems that must fill the page on a
>           partial write because they can't write arbitrary chunks of
>           data.

I'd be very worried by (1): sounds like a partial write followed by a
read of the full page could show up garbage in the page cache if you do
this.  If NFS skips the page clearing for partial writes, how does it
avoid returning garbage later?

--Stephen


--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dirty pages in memory & co.
  1999-05-10 19:37 ` Stephen C. Tweedie
@ 1999-05-10 21:01   ` Benjamin C.R. LaHaise
  1999-05-10 23:43     ` Stephen C. Tweedie
  1999-05-11  0:30   ` Eric W. Biederman
  1 sibling, 1 reply; 9+ messages in thread
From: Benjamin C.R. LaHaise @ 1999-05-10 21:01 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Eric W. Biederman, linux-mm

On Mon, 10 May 1999, Stephen C. Tweedie wrote:

> On 07 May 1999 09:56:00 -0500, ebiederm+eric@ccr.net (Eric W. Biederman)
> said:
> 
> >        It looks like I need 2 variations on generic_file_write at the
> >        moment. 
> >        1) for network filesystems that can get away without filling
> >           the page on a partial write.
> >        2) for block based filesystems that must fill the page on a
> >           partial write because they can't write arbitrary chunks of
> >           data.
> 
> I'd be very worried by (1): sounds like a partial write followed by a
> read of the full page could show up garbage in the page cache if you do
> this.  If NFS skips the page clearing for partial writes, how does it
> avoid returning garbage later?

Hmmm, it shouldn't be a problem if the write blocks the reading of the
page and PG_uptodate isn't set.  This conflicts with the current
assumption in generic_file_read that a locked page becoming unlocked
without PG_uptodate being set indicates an error -- the best thing here
is probably to add a PG_error flag and do away with the overloading.
Everything else should be checking PG_uptodate, right?

		-ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dirty pages in memory & co.
  1999-05-10 21:01   ` Benjamin C.R. LaHaise
@ 1999-05-10 23:43     ` Stephen C. Tweedie
  0 siblings, 0 replies; 9+ messages in thread
From: Stephen C. Tweedie @ 1999-05-10 23:43 UTC (permalink / raw)
  To: Benjamin C.R. LaHaise; +Cc: Stephen C. Tweedie, Eric W. Biederman, linux-mm

Hi,

On Mon, 10 May 1999 17:01:50 -0400 (EDT), "Benjamin C.R. LaHaise"
<blah@kvack.org> said:

> Hmmm, it shouldn't be a problem if the write blocks the reading of the
> page and PG_uptodate isn't set.  This conflicts with the current
> assumption in generic_file_read that a locked page becoming unlocked
> without PG_uptodate being set indicates an error -- the best thing here
> is probably to add a PG_error flag and do away with the overloading.

I'm not convinced: doing an explicit read-page and waking up to find it
not uptodate sure sounds like an error to me.  If we find a page which
isn't uptodate, then the first thing we do is try to read it, we don't
generate the error immediately.  Why do we need a new flag?

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dirty pages in memory & co.
  1999-05-10 19:37 ` Stephen C. Tweedie
  1999-05-10 21:01   ` Benjamin C.R. LaHaise
@ 1999-05-11  0:30   ` Eric W. Biederman
  1 sibling, 0 replies; 9+ messages in thread
From: Eric W. Biederman @ 1999-05-11  0:30 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-mm

>>>>> "ST" == Stephen C Tweedie <sct@redhat.com> writes:

ST> Hi,
ST> On 07 May 1999 09:56:00 -0500, ebiederm+eric@ccr.net (Eric W. Biederman)
ST> said:

>> It looks like I need 2 variations on generic_file_write at the
>> moment. 
>> 1) for network filesystems that can get away without filling
>> the page on a partial write.
>> 2) for block based filesystems that must fill the page on a
>> partial write because they can't write arbitrary chunks of
>> data.

ST> I'd be very worried by (1): sounds like a partial write followed by a
ST> read of the full page could show up garbage in the page cache if you do
ST> this.  If NFS skips the page clearing for partial writes, how does it
ST> avoid returning garbage later?

Actually (1) is current behaviour.  I really don't like it but I can see
how it can potentially improve performance.  Partial writes are handled
by not setting PG_uptodate.

Reads are handled by always flushing the per page dirty data before reading.

I don't especially like it but it's what we have now.

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dirty pages in memory & co.
  1999-05-10  0:57 ` Andrea Arcangeli
@ 1999-05-11  1:06   ` Eric W. Biederman
  1999-05-11 11:38     ` Andrea Arcangeli
  0 siblings, 1 reply; 9+ messages in thread
From: Eric W. Biederman @ 1999-05-11  1:06 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-mm

>>>>> "AA" == Andrea Arcangeli <andrea@e-mind.com> writes:

AA> On 7 May 1999, Eric W. Biederman wrote:
>> 7) Removing the swap lock map, by modify ipc/shm to use the page cache
>> and vm_stores.

AA> I just killed the swap lock map and I just use the swap cache for ipc shm
AA> memory.

Cool.  I'll have to take a look.  It should save some work.

AA> Now I was thinking at the reverse lookup from pagemap to pagetable that
AA> you mentioned. It would be easy to that at least for the page/swap cache
AA> mappings with the interface I added in my tree.

AA> But to support dynamic relocation/defrag of memory on the whole VM we
AA> should do that for _all_ pages. And to do the relocation we should run
AA> with the GFP pages mapped in a separate pte (not in the 4mbyte page table
AA> with the kernel). So I don't know if it would be better to just move all
AA> kernel memory (the one available through GFP) to virtual memory and to
AA> support the reverse lookup for all pages in the system, or if I should
AA> only do the quite-easy backdoor for the page/swap cache. The point is that
AA> supporting the reverse lookup for all kernel memory and having all kernel
AA> memory in virtual memory, will be a _major_ performance hit for all
AA> operations according to me.

Right, and it doesn't buy you very much.
We have some constants.
A) Defragging doesn't need to happen often.
B) We don't have much kernel memory tied up in locked down memory 
   (that we refer to with pointers).
C) For locked down memory there is also internal fragmentation to worry about.
D) The biggest gain is handling memory that isn't locked down.

So I think I would first concentrate on the common case, pages mappable
in user space, the page cache and anonymous memory.

For locked down memory a good solution looks like an incremental copy collector.
Not that we need the garbage collecting properties, but it is the only incremental
memory packing algorithm I currently know.  And we could take advantage of fine
grained smp locks to implement the needed write barrier.

AA> Right now i would need the reverse lookup only for the mapped cache
AA> because I would like to avoid to run swap_out to know if the pte is been
AA> accessed or not and in the case it's an old pte I could unmap the
AA> mmapped-page directly from shrink_mmap. But I am not convinced this will
AA> be an improvement too because I just run swap_out at the right time...

AA> Comments?

The reason I am looking at reverse page entries, is I would like to handle
dirty mapped pages better.  
My thought is basically to trap the fault that dirties the page and mark it dirty.
Then after it has aged long enough I unmap or at least clear the write allow bits of
the pte or ptes.

This does buy an improvement, in when things get written out.  But beyond that I
don't know.

It's certainly something to think about for your other algorithms.
The current scheme seems good enough for the most part however.

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dirty pages in memory & co.
  1999-05-11  1:06   ` Eric W. Biederman
@ 1999-05-11 11:38     ` Andrea Arcangeli
  1999-05-11 18:45       ` Stephen C. Tweedie
  0 siblings, 1 reply; 9+ messages in thread
From: Andrea Arcangeli @ 1999-05-11 11:38 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-mm

On 10 May 1999, Eric W. Biederman wrote:

>The reason I am looking at reverse page entries, is I would like to handle
>dirty mapped pages better.  
>My thought is basically to trap the fault that dirties the page and mark it dirty.
>Then after it has aged long enough I unmap or at least clear the write allow bits of
>the pte or ptes.
>
>This does buy an improvement, in when things get written out.  But beyond that I
>don't know.

Having the reverse lookup from pagemap to ptes would also make life a bit
easier in my update_shared_mappings ;). So in general I see your point.
Think when you'll clear the dirty bit from the pagemap, then you'll want
to mark clean also the pte in the tasks. Right?

But I am worried by page faults. The page fault that allow us to know
where there is an uptodate swap-entry on disk just hurt performances more
than not having such information (I did benchmarks).

>It's certainly something to think about for your other algorithms.

I am not sure if it's worthwhile, but I think it worth testing ;).

Andrea Arcangeli

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dirty pages in memory & co.
  1999-05-11 11:38     ` Andrea Arcangeli
@ 1999-05-11 18:45       ` Stephen C. Tweedie
  0 siblings, 0 replies; 9+ messages in thread
From: Stephen C. Tweedie @ 1999-05-11 18:45 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Eric W. Biederman, linux-mm

Hi,

On Tue, 11 May 1999 13:38:48 +0200 (CEST), Andrea Arcangeli
<andrea@e-mind.com> said:

> But I am worried by page faults. The page fault that allow us to know
> where there is an uptodate swap-entry on disk just hurt performances more
> than not having such information (I did benchmarks).

It obviously depends on whether you are swap-bound or CPU-bound.  Have
you tried both?

One thing I definitely agree with is that it may sometimes be preferable
to drop the swap cache to avoid fragmentation.  If we have a new dirty
page requiring writing to swap, and its VA neighbours are already in the
swap cache, it makes sense to eliminate the swap cache and write all the
pages to the new location to keep them contiguous on disk.  

The real aim here is to allow us to keep dirty pages in the swap cache
too: this will allow us to keep good, unfragmented swap allocations by
persistently assigning a contiguous range of swap to a contiguous range
of process data pages, even if the process is only dirtying some of
those pages.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~1999-05-11 18:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-05-07 14:56 [PATCH] dirty pages in memory & co Eric W. Biederman
1999-05-10  0:57 ` Andrea Arcangeli
1999-05-11  1:06   ` Eric W. Biederman
1999-05-11 11:38     ` Andrea Arcangeli
1999-05-11 18:45       ` Stephen C. Tweedie
1999-05-10 19:37 ` Stephen C. Tweedie
1999-05-10 21:01   ` Benjamin C.R. LaHaise
1999-05-10 23:43     ` Stephen C. Tweedie
1999-05-11  0:30   ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox