linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: Why don't shared anonymous mappings work?
@ 1999-01-14  3:07 Colin Plumb
  1999-01-15  6:07 ` Benjamin C.R. LaHaise
  0 siblings, 1 reply; 11+ messages in thread
From: Colin Plumb @ 1999-01-14  3:07 UTC (permalink / raw)
  To: blah; +Cc: linux-mm

Ben LaHaise wrote:
> On Wed, 13 Jan 1999, Colin Plumb wrote:
> 
>> Um, I just thought of another problem with shared anonymous pages.
>> It's similar to the zero-page issue you raised, but it's no longer
>> a single special case.
>> 
>> Copy-on-write and shared mappings.  Let's say that process 1 has a COW
>> copy of page X.  Then the page is shared (via mmap /proc/1/mem or some
>> such) with process 2.  Now process A writes to the page.
> 
> Shared mappings *cannot* be COW.  The mmap(/proc/<pid>/mem) was killed for
> good reasons, and if it's truely nescessary, I suggest that mmap enforce
> that the mapping is of the same (or compatible in the case of files)
> nature.  Anything less overcomplicates things needlessly, leading to a
> Mach like vm system.

Um, okay, how about a more plausible scenario.  Processes 1 and 2
share a page X.  Process 1 forks.

Doesn't this lead to the hairy Mach-like situation?

>> It *is* possible to link PTE entries together in a singly-linked list
>> where a pointer to another PTE is distinguishable from a pointer to
>> a disk block or a valid PTE.  I have thought of using this to update
>> more PTEs when a page is swapped in, as the swapper-in would traverse
>> the list to find the page at the end, swap it in if necessary, and
>> copy the mapping to all the entries it traversed.
> 
> Been there, done it, got the t-shirt.  Creating actual lists for ptes is a
> Bad Idea (tm), as I learned in my original pte_list patch.  It doubles the
> size of the page tables and is *slow*.  Plus, as Linus pointed out a long
> time ago, we've got the information already -- and there's an easy way to
> get it for the anonymous case too (it just requires a wee bit 'o
> bookkeeping on anonymous vmas).  I'm in the process of updating the patch
> now that things are stablizing, so stay tuned...

Um, I think you fail to understand.  I was talking about a linked list
*without* allocating extra space.  The idea is that I don't know of a
processor that requires more than 2 bits (M68K) to mark a PTE as invalid;
the user gets the rest.  Currently the user bits in the invalid PTE
encodings point to swap pages.  You could steal one bit and point to
either a word in memory or a swap page.

Because memory does not fill the entire virtual address space and has
alignment contraints, the number of bits needed for the pointer leaves
room for the invalid flag bit(s) and the memory/swap pointer-type bit.

I am quite aware that enlaring all of the PTEs in the system is a cost to
be avoided if at all possible.
-- 
	-Colin
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: Why don't shared anonymous mappings work?
@ 1999-01-15  6:43 Colin Plumb
  0 siblings, 0 replies; 11+ messages in thread
From: Colin Plumb @ 1999-01-15  6:43 UTC (permalink / raw)
  To: blah; +Cc: linux-mm

>> Um, I think you fail to understand.  I was talking about a linked list
>> *without* allocating extra space.  The idea is that I don't know of a
>> processor that requires more than 2 bits (M68K) to mark a PTE as invalid;
>> the user gets the rest.  Currently the user bits in the invalid PTE
>> encodings point to swap pages.  You could steal one bit and point to
>> either a word in memory or a swap page.

> Ooops, brain fart (sometimes you read, but the meaning just isn't
> absorbed).  I think assuming that you can get 30 bits out of a pte on a 32
> bit platform to use as a pointer is pushing things, though (and you do
> need all the bits: mremap allows users to move shared pages to different
> offset within a page table).

Not quite.  You only need as many bits as are needed to address all of
physical memory minus the number of bits implied by PTE alignment.

So, for a 2 GB machine, you need 29 bits for a pointer to a 32-bit word.
Plus one for the type bit does equal 30, but many machines are smaller.

The bits available (looking at include/asm-*/pgtable.h) are:
alpha: 32 (plus more, I think - the code doesn't try too hard.)
arm/proc-armo: 31
arm/proc-armv: 30
i386: 30
m68k: 30 (old), 27 (new)
mips: 24
ppc: 31
sparc: 25
sparc64: 51

We seem to be doing okay, except for the MIPS and Sparc ports, and maybe
the code isn't as aggressive as it cound be there.

> Under the scheme I'm planning on
> implementing, this is a non issue: all pages are tied to an inode. 
> Alternatively, we could pull i_mmap & co out of struct inode and make a
> vmstore (or whatever) object as I believe Eric suggested. 

What's nice is the low over head of the current scheme; there's no space
allocated for bookkeeping except two bytes of swap map per swap page.

You need to maintain a more complex structure, I think.
-- 
	-Colin
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: Why don't shared anonymous mappings work?
@ 1999-01-13 21:31 Colin Plumb
  1999-01-19 14:32 ` Stephen C. Tweedie
  0 siblings, 1 reply; 11+ messages in thread
From: Colin Plumb @ 1999-01-13 21:31 UTC (permalink / raw)
  To: sct; +Cc: linux-mm

Um, I just thought of another problem with shared anonymous pages.
It's similar to the zero-page issue you raised, but it's no longer
a single special case.

Copy-on-write and shared mappings.  Let's say that process 1 has a COW
copy of page X.  Then the page is shared (via mmap /proc/1/mem or some
such) with process 2.  Now process A writes to the page.

While copying the page, we have to update B's pte to point to the new copy.
But we have no data structure to keep track of the sharing structure.

It's a fairly simple structure, really, since a given physical page maps
(via COW) to one or more separate logical pages, which are in turn each
mapped into one or more memory maps.  Becasue of the "one or more", you
can hope to integrate it into another structure, but ugh.  For n copies,
you need n-1 non-null pointers.  That's a lot of null pointers when n
has its usual value of 1.

One possible fix is to consider multiple mappings of a logical page to
be a write for the purposes of COW copying.  That way, each physical
page is *either* COW-mapped to multiple logical pages, each present in
*one* mmap, or corresponds to one logical page which is present in one
or more maps.  This reduces the tree down to one level, and a type bit
in the page structure will do.

It *is* possible to link PTE entries together in a singly-linked list
where a pointer to another PTE is distinguishable from a pointer to
a disk block or a valid PTE.  I have thought of using this to update
more PTEs when a page is swapped in, as the swapper-in would traverse
the list to find the page at the end, swap it in if necessary, and
copy the mapping to all the entries it traversed.

Come to think of it, this *could* be used for zero-mapped pages.
Make it a circularly linked list.  (You could even distinguish
circular and non-circular lists if you need both.)  Then when
the page is accessed, allocate it and copy the pointer to all the
other PTEs on the list.

Do you have any other ideas?
-- 
	-Colin
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread
[parent not found: <199901061523.IAA14788@nyx10.nyx.net>]
* Why don't shared anonymous mappings work?
@ 1999-01-05 12:51 Colin Plumb
  1999-01-06  4:05 ` Eric W. Biederman
  0 siblings, 1 reply; 11+ messages in thread
From: Colin Plumb @ 1999-01-05 12:51 UTC (permalink / raw)
  To: linux-mm

I was trying to explain to Paul R. Wilson why shared anonymous mappings
don't work and started having problems provinf that it couldn't
work without significant software changes.  Could someone check me
on the following logic which purports to explain how it could be done?

Shared anonymous mappings are easy as long as the page never leaves
memory.

In any given PTE, a page is either "in" or "out".  If "out", the PTE
is marked invalid but holds a swap offset where the page can be found.

A page may be both in memory and in swap.  In this case, the page cache
(indexed on inode and offset) is used to implement a swap cache, using
a special swapper_inode and the swap offset.  Once a page is in the
swap cache, mappings in both directions are efficient.  In memory, the
struct page contains the swap offset, and given the swap offset, the
page cache hash table will efficiently find the struct page (if any).

When a page is put into the swap cache, it is marked read-only.
Since there is only one PTE referencing it, all attempts to modify the
page will be trapped, and the swap cache entry invalidated.
(The call is in do_wp_page in mm/memory.c.)

But why can't we allow writeable swap-cached pages?  Which engender
dirty swap-cached pages, of course.  If a swap-cached page is dirty,
its disk data is invalid, but the address is still kept because
it may be in some PTEs.  (At least until the swap_map reaches 0.)

It seems that the handling would be just like mapped files as long
as you maintained a swap file entry for the page.  There are differences,
such as the fact that when mapping a not-present page, the inode is
implicit in the fact that it's an anonymous vma range and the offset
is taken from the PTE, rather than being derived from the VMA
offset.

There is some hairy magic relating to closing off all of the writeable
mappings of a page before it can be written to disk and marked clean,
but I presume those are handled for file mappings.

Basically, a dirty swap-cached page would only be written when it
was removed from the *last* process's PTE.  A less-efficient way would
be to write it out each time a dirty mapping is removed.  (This seems
to be what happens to dirty pages in filemap_swapout.  Ideally,
as long as there are writeable mappings, it should just copy the
reference's dirty bit to the page and then write the page if needed
when the last writeable mapping is removed.)

The only significant difference is that in do_wp_page, you don't
remove the page from the swap cache if there are other references to it
in swap_map.

Okay, now... I'm sure this is not some brilliant insight that everyone
else has missed.  Sould someone tell methe part *I* missed about  why it
won't work?
-- 
	-Colin
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~1999-01-19 15:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-01-14  3:07 Why don't shared anonymous mappings work? Colin Plumb
1999-01-15  6:07 ` Benjamin C.R. LaHaise
  -- strict thread matches above, loose matches on Subject: below --
1999-01-15  6:43 Colin Plumb
1999-01-13 21:31 Colin Plumb
1999-01-19 14:32 ` Stephen C. Tweedie
1999-01-19 15:23   ` Eric W. Biederman
     [not found] <199901061523.IAA14788@nyx10.nyx.net>
1999-01-06 19:51 ` Eric W. Biederman
1999-01-07  5:55   ` Eric W. Biederman
1999-01-13 20:21     ` Stephen C. Tweedie
1999-01-05 12:51 Colin Plumb
1999-01-06  4:05 ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox