Re: Why don't shared anonymous mappings work?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: Why don't shared anonymous mappings work?
@ 1999-01-14  3:07 Colin Plumb
  1999-01-15  6:07 ` Benjamin C.R. LaHaise
  0 siblings, 1 reply; 11+ messages in thread
From: Colin Plumb @ 1999-01-14  3:07 UTC (permalink / raw)
  To: blah; +Cc: linux-mm

Ben LaHaise wrote:
> On Wed, 13 Jan 1999, Colin Plumb wrote:
> 
>> Um, I just thought of another problem with shared anonymous pages.
>> It's similar to the zero-page issue you raised, but it's no longer
>> a single special case.
>> 
>> Copy-on-write and shared mappings.  Let's say that process 1 has a COW
>> copy of page X.  Then the page is shared (via mmap /proc/1/mem or some
>> such) with process 2.  Now process A writes to the page.
> 
> Shared mappings *cannot* be COW.  The mmap(/proc/<pid>/mem) was killed for
> good reasons, and if it's truely nescessary, I suggest that mmap enforce
> that the mapping is of the same (or compatible in the case of files)
> nature.  Anything less overcomplicates things needlessly, leading to a
> Mach like vm system.

Um, okay, how about a more plausible scenario.  Processes 1 and 2
share a page X.  Process 1 forks.

Doesn't this lead to the hairy Mach-like situation?

>> It *is* possible to link PTE entries together in a singly-linked list
>> where a pointer to another PTE is distinguishable from a pointer to
>> a disk block or a valid PTE.  I have thought of using this to update
>> more PTEs when a page is swapped in, as the swapper-in would traverse
>> the list to find the page at the end, swap it in if necessary, and
>> copy the mapping to all the entries it traversed.
> 
> Been there, done it, got the t-shirt.  Creating actual lists for ptes is a
> Bad Idea (tm), as I learned in my original pte_list patch.  It doubles the
> size of the page tables and is *slow*.  Plus, as Linus pointed out a long
> time ago, we've got the information already -- and there's an easy way to
> get it for the anonymous case too (it just requires a wee bit 'o
> bookkeeping on anonymous vmas).  I'm in the process of updating the patch
> now that things are stablizing, so stay tuned...

Um, I think you fail to understand.  I was talking about a linked list
*without* allocating extra space.  The idea is that I don't know of a
processor that requires more than 2 bits (M68K) to mark a PTE as invalid;
the user gets the rest.  Currently the user bits in the invalid PTE
encodings point to swap pages.  You could steal one bit and point to
either a word in memory or a swap page.

Because memory does not fill the entire virtual address space and has
alignment contraints, the number of bits needed for the pointer leaves
room for the invalid flag bit(s) and the memory/swap pointer-type bit.

I am quite aware that enlaring all of the PTEs in the system is a cost to
be avoided if at all possible.
-- 
	-Colin
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Why don't shared anonymous mappings work?
  1999-01-14  3:07 Why don't shared anonymous mappings work? Colin Plumb
@ 1999-01-15  6:07 ` Benjamin C.R. LaHaise
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin C.R. LaHaise @ 1999-01-15  6:07 UTC (permalink / raw)
  To: Colin Plumb; +Cc: linux-mm

On Wed, 13 Jan 1999, Colin Plumb wrote:

> Um, okay, how about a more plausible scenario.  Processes 1 and 2
> share a page X.  Process 1 forks.
> 
> Doesn't this lead to the hairy Mach-like situation?

Nope, the new process inherits the mapping with the shared attribute
intact.

> Um, I think you fail to understand.  I was talking about a linked list
> *without* allocating extra space.  The idea is that I don't know of a
> processor that requires more than 2 bits (M68K) to mark a PTE as invalid;
> the user gets the rest.  Currently the user bits in the invalid PTE
> encodings point to swap pages.  You could steal one bit and point to
> either a word in memory or a swap page.

Ooops, brain fart (sometimes you read, but the meaning just isn't
absorbed).  I think assuming that you can get 30 bits out of a pte on a 32
bit platform to use as a pointer is pushing things, though (and you do
need all the bits: mremap allows users to move shared pages to different
offset within a page table).  Under the scheme I'm planning on
implementing, this is a non issue: all pages are tied to an inode. 
Alternatively, we could pull i_mmap & co out of struct inode and make a
vmstore (or whatever) object as I believe Eric suggested. 

		-ben

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Why don't shared anonymous mappings work?
@ 1999-01-15  6:43 Colin Plumb
  0 siblings, 0 replies; 11+ messages in thread
From: Colin Plumb @ 1999-01-15  6:43 UTC (permalink / raw)
  To: blah; +Cc: linux-mm

>> Um, I think you fail to understand.  I was talking about a linked list
>> *without* allocating extra space.  The idea is that I don't know of a
>> processor that requires more than 2 bits (M68K) to mark a PTE as invalid;
>> the user gets the rest.  Currently the user bits in the invalid PTE
>> encodings point to swap pages.  You could steal one bit and point to
>> either a word in memory or a swap page.

> Ooops, brain fart (sometimes you read, but the meaning just isn't
> absorbed).  I think assuming that you can get 30 bits out of a pte on a 32
> bit platform to use as a pointer is pushing things, though (and you do
> need all the bits: mremap allows users to move shared pages to different
> offset within a page table).

Not quite.  You only need as many bits as are needed to address all of
physical memory minus the number of bits implied by PTE alignment.

So, for a 2 GB machine, you need 29 bits for a pointer to a 32-bit word.
Plus one for the type bit does equal 30, but many machines are smaller.

The bits available (looking at include/asm-*/pgtable.h) are:
alpha: 32 (plus more, I think - the code doesn't try too hard.)
arm/proc-armo: 31
arm/proc-armv: 30
i386: 30
m68k: 30 (old), 27 (new)
mips: 24
ppc: 31
sparc: 25
sparc64: 51

We seem to be doing okay, except for the MIPS and Sparc ports, and maybe
the code isn't as aggressive as it cound be there.

> Under the scheme I'm planning on
> implementing, this is a non issue: all pages are tied to an inode. 
> Alternatively, we could pull i_mmap & co out of struct inode and make a
> vmstore (or whatever) object as I believe Eric suggested. 

What's nice is the low over head of the current scheme; there's no space
allocated for bookkeeping except two bytes of swap map per swap page.

You need to maintain a more complex structure, I think.
-- 
	-Colin
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Why don't shared anonymous mappings work?
@ 1999-01-13 21:31 Colin Plumb
  1999-01-19 14:32 ` Stephen C. Tweedie
  0 siblings, 1 reply; 11+ messages in thread
From: Colin Plumb @ 1999-01-13 21:31 UTC (permalink / raw)
  To: sct; +Cc: linux-mm

Um, I just thought of another problem with shared anonymous pages.
It's similar to the zero-page issue you raised, but it's no longer
a single special case.

Copy-on-write and shared mappings.  Let's say that process 1 has a COW
copy of page X.  Then the page is shared (via mmap /proc/1/mem or some
such) with process 2.  Now process A writes to the page.

While copying the page, we have to update B's pte to point to the new copy.
But we have no data structure to keep track of the sharing structure.

It's a fairly simple structure, really, since a given physical page maps
(via COW) to one or more separate logical pages, which are in turn each
mapped into one or more memory maps.  Becasue of the "one or more", you
can hope to integrate it into another structure, but ugh.  For n copies,
you need n-1 non-null pointers.  That's a lot of null pointers when n
has its usual value of 1.

One possible fix is to consider multiple mappings of a logical page to
be a write for the purposes of COW copying.  That way, each physical
page is *either* COW-mapped to multiple logical pages, each present in
*one* mmap, or corresponds to one logical page which is present in one
or more maps.  This reduces the tree down to one level, and a type bit
in the page structure will do.

It *is* possible to link PTE entries together in a singly-linked list
where a pointer to another PTE is distinguishable from a pointer to
a disk block or a valid PTE.  I have thought of using this to update
more PTEs when a page is swapped in, as the swapper-in would traverse
the list to find the page at the end, swap it in if necessary, and
copy the mapping to all the entries it traversed.

Come to think of it, this *could* be used for zero-mapped pages.
Make it a circularly linked list.  (You could even distinguish
circular and non-circular lists if you need both.)  Then when
the page is accessed, allocate it and copy the pointer to all the
other PTEs on the list.

Do you have any other ideas?
-- 
	-Colin
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Why don't shared anonymous mappings work?
  1999-01-13 21:31 Colin Plumb
@ 1999-01-19 14:32 ` Stephen C. Tweedie
  1999-01-19 15:23   ` Eric W. Biederman
  0 siblings, 1 reply; 11+ messages in thread
From: Stephen C. Tweedie @ 1999-01-19 14:32 UTC (permalink / raw)
  To: Colin Plumb; +Cc: sct, linux-mm

Hi,

On Wed, 13 Jan 1999 14:31:41 -0700 (MST), Colin Plumb <colin@nyx.net>
said:

> Um, I just thought of another problem with shared anonymous pages.
> It's similar to the zero-page issue you raised, but it's no longer
> a single special case.

> Copy-on-write and shared mappings.  Let's say that process 1 has a COW
> copy of page X.  Then the page is shared (via mmap /proc/1/mem or some
> such) with process 2.  Now process A writes to the page.

Invalid argument.  This is *precisely* why mmap of /proc/X/mem is
broken.  We don't need to implement reasonable semantics for that case,
because there _are_ no reasonable semantics for a page which can be both
MAP_PRIVATE and MAP_SHARED in the same process.

--Stephen
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Why don't shared anonymous mappings work?
  1999-01-19 14:32 ` Stephen C. Tweedie
@ 1999-01-19 15:23   ` Eric W. Biederman
  0 siblings, 0 replies; 11+ messages in thread
From: Eric W. Biederman @ 1999-01-19 15:23 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Colin Plumb, linux-mm

>>>>> "ST" == Stephen C Tweedie <sct@redhat.com> writes:

ST> Hi,
ST> On Wed, 13 Jan 1999 14:31:41 -0700 (MST), Colin Plumb <colin@nyx.net>
ST> said:

>> Um, I just thought of another problem with shared anonymous pages.
>> It's similar to the zero-page issue you raised, but it's no longer
>> a single special case.

>> Copy-on-write and shared mappings.  Let's say that process 1 has a COW
>> copy of page X.  Then the page is shared (via mmap /proc/1/mem or some
>> such) with process 2.  Now process A writes to the page.

ST> Invalid argument.  This is *precisely* why mmap of /proc/X/mem is
ST> broken.  We don't need to implement reasonable semantics for that case,
ST> because there _are_ no reasonable semantics for a page which can be both
ST> MAP_PRIVATE and MAP_SHARED in the same process.

Thank you for stomping on this, and my apologies a while ago for bringing it up.  

Dosemu keeps coming to my mind.  For 2.3 we need a better version of shared
memory for dosemu to use.  shm is fine but it's not flexible enough.

For multiple MAP_PRIVATE & MAP_SHARED mappings, the most we can
theoretically allow is:
o If a page is updated, we need to update at most the page table entry, the write came from.
o If a write did not come from a page table entry we need to update no page table entries.
o During a mapping we need to update at most one old pte per page, and old
  pte's that are updated must be in the same mm.

With those guidelines the best we can allow for /proc/self/mem is to
promote the page into a shared anonymous mapping, or fail.

Anything else would break the guidelines above.  Which are what we need
if we want to avoid reverse page table entries, which is reasonable.

Eric
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <199901061523.IAA14788@nyx10.nyx.net>]

* Re: Why don't shared anonymous mappings work?
       [not found] <199901061523.IAA14788@nyx10.nyx.net>
@ 1999-01-06 19:51 ` Eric W. Biederman
  1999-01-07  5:55   ` Eric W. Biederman
  0 siblings, 1 reply; 11+ messages in thread
From: Eric W. Biederman @ 1999-01-06 19:51 UTC (permalink / raw)
  To: Colin Plumb; +Cc: linux-mm

>>>>> "CP" == Colin Plumb <colin@nyx.net> writes:

>> Take a page map it into two processes.
>> Swap the page out from both processes to disk.
>> The swap address is now in the pte's.
>> Bring that page into process 1.
>> Dirty the page, thus causing a new swap entry to be allocated.
>> ( The write once rule)
>> Swap the page out of process 1.
>> 
>> Oops process 1 and process 2 have different pte's for the same
>> page.
>> 
>> Since we don't have any form of reverse page table entry
>> preventing that last case is difficult to do effciently.

CP> Um, but what if, as I was suggesting, we *don't* allocate a new swap
CP> entry when the page is dirtied?  That is, when do_wp_page sees that the
CP> page is in the swap cache, it looks at swap_map, sees that is greater
CP> than 2, and leaves it as a writeable swap-cached page.

Sorry I must have misread that part.

I guess the final trick would be to make sure we always bring a shared
memory area into a processes address space because so that we can ensure
it will get swapped out, and the pte put in the processes address space.

Currently vma's won't merge unless your offsets are contigous, which 
we can't garantee for swap space, and having multiple vmas would be a real pain.

Handling /proc/self/mem mappings into the same process correctly
could be interesting however. Because the definition of private and
shared gets a little muddled.... 

The only reason remaining that I can think of why it isn't there
is that
a) no one wrote the code
b) It is very close to 2.2

Eric

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Why don't shared anonymous mappings work?
  1999-01-06 19:51 ` Eric W. Biederman
@ 1999-01-07  5:55   ` Eric W. Biederman
  1999-01-13 20:21     ` Stephen C. Tweedie
  0 siblings, 1 reply; 11+ messages in thread
From: Eric W. Biederman @ 1999-01-07  5:55 UTC (permalink / raw)
  To: Colin Plumb, linux-mm

And of course the last reason I just thought of, which is probably the real reason.

Currenlty anonymous pages if the are writable are assumed to have exactly
one mapping, or if it is in the swap cache the page is assumed to be read only.

So reusing the swap inode could be a real problem.

Eric

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Why don't shared anonymous mappings work?
  1999-01-07  5:55   ` Eric W. Biederman
@ 1999-01-13 20:21     ` Stephen C. Tweedie
  0 siblings, 0 replies; 11+ messages in thread
From: Stephen C. Tweedie @ 1999-01-13 20:21 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Colin Plumb, linux-mm, Stephen Tweedie

Hi,

On 06 Jan 1999 23:55:03 -0600, ebiederm+eric@ccr.net (Eric W. Biederman)
said:

> And of course the last reason I just thought of, which is probably the
> real reason.  Currenlty anonymous pages if the are writable are
> assumed to have exactly one mapping, or if it is in the swap cache the
> page is assumed to be read only.

> So reusing the swap inode could be a real problem.

Yes.  The _only_ reason we can't do anonymous pages right now is the
VM's assumption that all swap cache pages are read-only.  Once we relax
that, the only thing left is the initialisation of anonymous page ptes
(remembering that when we fill in a demand-zero anonymous shared page,
we will have to update that page's pte in every mm which shares the
page).  Other than that, allowing writable swap-cache pages is all that
is required.  It's just too much of a potential destabiliser to add this
close to 2.2.0.

--Stephen

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Why don't shared anonymous mappings work?
@ 1999-01-05 12:51 Colin Plumb
  1999-01-06  4:05 ` Eric W. Biederman
  0 siblings, 1 reply; 11+ messages in thread
From: Colin Plumb @ 1999-01-05 12:51 UTC (permalink / raw)
  To: linux-mm

I was trying to explain to Paul R. Wilson why shared anonymous mappings
don't work and started having problems provinf that it couldn't
work without significant software changes.  Could someone check me
on the following logic which purports to explain how it could be done?

Shared anonymous mappings are easy as long as the page never leaves
memory.

In any given PTE, a page is either "in" or "out".  If "out", the PTE
is marked invalid but holds a swap offset where the page can be found.

A page may be both in memory and in swap.  In this case, the page cache
(indexed on inode and offset) is used to implement a swap cache, using
a special swapper_inode and the swap offset.  Once a page is in the
swap cache, mappings in both directions are efficient.  In memory, the
struct page contains the swap offset, and given the swap offset, the
page cache hash table will efficiently find the struct page (if any).

When a page is put into the swap cache, it is marked read-only.
Since there is only one PTE referencing it, all attempts to modify the
page will be trapped, and the swap cache entry invalidated.
(The call is in do_wp_page in mm/memory.c.)

But why can't we allow writeable swap-cached pages?  Which engender
dirty swap-cached pages, of course.  If a swap-cached page is dirty,
its disk data is invalid, but the address is still kept because
it may be in some PTEs.  (At least until the swap_map reaches 0.)

It seems that the handling would be just like mapped files as long
as you maintained a swap file entry for the page.  There are differences,
such as the fact that when mapping a not-present page, the inode is
implicit in the fact that it's an anonymous vma range and the offset
is taken from the PTE, rather than being derived from the VMA
offset.

There is some hairy magic relating to closing off all of the writeable
mappings of a page before it can be written to disk and marked clean,
but I presume those are handled for file mappings.

Basically, a dirty swap-cached page would only be written when it
was removed from the *last* process's PTE.  A less-efficient way would
be to write it out each time a dirty mapping is removed.  (This seems
to be what happens to dirty pages in filemap_swapout.  Ideally,
as long as there are writeable mappings, it should just copy the
reference's dirty bit to the page and then write the page if needed
when the last writeable mapping is removed.)

The only significant difference is that in do_wp_page, you don't
remove the page from the swap cache if there are other references to it
in swap_map.

Okay, now... I'm sure this is not some brilliant insight that everyone
else has missed.  Sould someone tell methe part *I* missed about  why it
won't work?
-- 
	-Colin
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Why don't shared anonymous mappings work?
  1999-01-05 12:51 Colin Plumb
@ 1999-01-06  4:05 ` Eric W. Biederman
  0 siblings, 0 replies; 11+ messages in thread
From: Eric W. Biederman @ 1999-01-06  4:05 UTC (permalink / raw)
  To: Colin Plumb; +Cc: linux-mm

Take a page map it into two processes.
Swap the page out from both processes to disk.
The swap address is now in the pte's.
Bring that page into process 1.
Dirty the page, thus causing a new swap entry to be allocated.
   ( The write once rule)
Swap the page out of process 1.

Oops process 1 and process 2 have different pte's for the same
page.

Since we don't have any form of reverse page table entry
preventing that last case is difficult to do effciently.

Eric

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~1999-01-19 15:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-01-14  3:07 Why don't shared anonymous mappings work? Colin Plumb
1999-01-15  6:07 ` Benjamin C.R. LaHaise
  -- strict thread matches above, loose matches on Subject: below --
1999-01-15  6:43 Colin Plumb
1999-01-13 21:31 Colin Plumb
1999-01-19 14:32 ` Stephen C. Tweedie
1999-01-19 15:23   ` Eric W. Biederman
     [not found] <199901061523.IAA14788@nyx10.nyx.net>
1999-01-06 19:51 ` Eric W. Biederman
1999-01-07  5:55   ` Eric W. Biederman
1999-01-13 20:21     ` Stephen C. Tweedie
1999-01-05 12:51 Colin Plumb
1999-01-06  4:05 ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox