* Fixing private mappings
@ 1998-04-23 5:06 Eric W. Biederman
1998-04-23 8:28 ` Rik van Riel
1998-04-23 15:12 ` Benjamin C.R. LaHaise
0 siblings, 2 replies; 6+ messages in thread
From: Eric W. Biederman @ 1998-04-23 5:06 UTC (permalink / raw)
To: linux-mm
Please excuse me for thinking out loud but private mappings seems to be
a hard problem that has not been correctly implemented in the linux
kernel.
Definition of Private Mappings:
A private mapping is a copy-on-write mapping of a file.
That is if the file is written to after the mapping is established,
the contents of the mapping will always remain what the contents of
the file was at the time of the private mapping.
Further if another private mapping is established after one
private mapping has been established it should have the file contents
of the file at the time the mapping is established. Not at the time
any previous private mapping was established.
A few ideas occur to me for specific problems, but the whole problem
is a challenge.
What I do know is that we need some kind of write barrier that we
check to see if we have made a copy of a page for any private mappings
that may exist before we write to it.
How should we find those private mappings?
Wait. That would be follow inode->i_mmap whenver we read in a page.
And then have code in generic_file_write, and update_vm_cache, to make
sure the copies are made at the appropriate times.
How should we maximize sharing of private mappings?
The simplest solution would be to continue with the current solution,
and just restrict mappings 512 byte boundaries.
A slightly more generic solution would be to introduce a new ``inode''
that new it was a copy of the old inode but at a different offset. If
these new ``inodes'' would then have a linked list of their own, that
could be followed for update purposes.
--
Extra inodes for files could also be extended to allow an offset at
say 4TB or so into a file so that we can handle any sized file.
Though obviously you can't cache it all at once, but you could cache
any piece ;)
There is a possibility there for per-inode metadata too but I'm not
certain about that one.
I think since my initial goal was large file support with the common
case on intel being restricted to 32bit integers, I'll play with the
extra inodes approach.
It will probably be smart to restrict ourselves to still only allowing
mappings on fs block boundaries. There are some efficiency gained
there (on reading pages that are totally not in memory in) but
otherwise we should be fine.
Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fixing private mappings
1998-04-23 5:06 Fixing private mappings Eric W. Biederman
@ 1998-04-23 8:28 ` Rik van Riel
1998-04-23 15:12 ` Benjamin C.R. LaHaise
1 sibling, 0 replies; 6+ messages in thread
From: Rik van Riel @ 1998-04-23 8:28 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: linux-mm
On 23 Apr 1998, Eric W. Biederman wrote:
> A slightly more generic solution would be to introduce a new ``inode''
> that new it was a copy of the old inode but at a different offset. If
> these new ``inodes'' would then have a linked list of their own, that
> could be followed for update purposes.
>
> Extra inodes for files could also be extended to allow an offset at
> say 4TB or so into a file so that we can handle any sized file.
> Though obviously you can't cache it all at once, but you could cache
> any piece ;)
This is a nice idea indeed. Maybe we could even use the
'extra inodes' idea to implement arbitrarily large files
on _any_ architecture...
When the on-disk maximum size is reached, just grab a
slave inode and start on part two.
The same for in-memory inodes, but there the maximum
size may be different.
Rik.
+-------------------------------------------+--------------------------+
| Linux: - LinuxHQ MM-patches page | Scouting webmaster |
| - kswapd ask-him & complain-to guy | Vries cubscout leader |
| http://www.phys.uu.nl/~riel/ | <H.H.vanRiel@phys.uu.nl> |
+-------------------------------------------+--------------------------+
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fixing private mappings
1998-04-23 5:06 Fixing private mappings Eric W. Biederman
1998-04-23 8:28 ` Rik van Riel
@ 1998-04-23 15:12 ` Benjamin C.R. LaHaise
1998-04-23 22:03 ` Eric W. Biederman
1 sibling, 1 reply; 6+ messages in thread
From: Benjamin C.R. LaHaise @ 1998-04-23 15:12 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: linux-mm
On 23 Apr 1998, Eric W. Biederman wrote:
> Please excuse me for thinking out loud but private mappings seems to be
> a hard problem that has not been correctly implemented in the linux
> kernel.
>
> Definition of Private Mappings:
> A private mapping is a copy-on-write mapping of a file.
>
> That is if the file is written to after the mapping is established,
> the contents of the mapping will always remain what the contents of
> the file was at the time of the private mapping.
No, this is not the case. Examine the behaviour of other unicies out
there that implement mmap. The following is quoted from the man page for
mmap on Solaris:
MAP_SHARED and MAP_PRIVATE describe the disposition of write
references to the memory object. If MAP_SHARED is speci-
fied, write references will change the memory object. If
MAP_PRIVATE is specified, the initial write reference will
create a private copy of the memory object page and redirect
the mapping to the copy. Either MAP_SHARED or MAP_PRIVATE
must be specified, but not both. The mapping type is
retained across a fork(2).
Note: 'the initial write reference will create a private copy' -- not
the act of reading or mapping.
> Further if another private mapping is established after one
> private mapping has been established it should have the file contents
> of the file at the time the mapping is established. Not at the time
> any previous private mapping was established.
Linux does behave this way currently.
...
> A slightly more generic solution would be to introduce a new ``inode''
> that new it was a copy of the old inode but at a different offset. If
> these new ``inodes'' would then have a linked list of their own, that
> could be followed for update purposes.
...
This would be the appropriate thing to do if you'd like see such exotic
behaviour ;-)
-ben
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Fixing private mappings
1998-04-23 15:12 ` Benjamin C.R. LaHaise
@ 1998-04-23 22:03 ` Eric W. Biederman
1998-04-24 20:37 ` Stephen C. Tweedie
0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 1998-04-23 22:03 UTC (permalink / raw)
To: Benjamin C.R. LaHaise; +Cc: linux-mm
>>>>> "BL" == Benjamin C R LaHaise <blah@kvack.org> writes:
BL> On 23 Apr 1998, Eric W. Biederman wrote:
>> Please excuse me for thinking out loud but private mappings seems to be
>> a hard problem that has not been correctly implemented in the linux
>> kernel.
>>
>> Definition of Private Mappings:
>> A private mapping is a copy-on-write mapping of a file.
>>
>> That is if the file is written to after the mapping is established,
>> the contents of the mapping will always remain what the contents of
>> the file was at the time of the private mapping.
BL> No, this is not the case. Examine the behaviour of other unicies out
BL> there that implement mmap. The following is quoted from the man page for
BL> mmap on Solaris:
BL> MAP_SHARED and MAP_PRIVATE describe the disposition of write
BL> references to the memory object. If MAP_SHARED is speci-
BL> fied, write references will change the memory object. If
BL> MAP_PRIVATE is specified, the initial write reference will
BL> create a private copy of the memory object page and redirect
BL> the mapping to the copy. Either MAP_SHARED or MAP_PRIVATE
BL> must be specified, but not both. The mapping type is
BL> retained across a fork(2).
BL> Note: 'the initial write reference will create a private copy' -- not
BL> the act of reading or mapping.
Right. That is probably the only reasonable way to implement it.
I stated it as I did so what happens if another process writes to the
file is clear. Another process writing to the file will be the
`initial write reference'.
So logically MAP_PRIVATE gives you a snapshot of the contents of a
file. Not that it actually takes that snapshot...
Possibly I'm failing to see the difference in the definitions?
Was it the always remains the same bit? I was thinking of what the
contents of the mapping would be if you don't write to it.
>> Further if another private mapping is established after one
>> private mapping has been established it should have the file contents
>> of the file at the time the mapping is established. Not at the time
>> any previous private mapping was established.
BL> Linux does behave this way currently.
Only most of the time.
With private mappings at 1k alignment. I have written a program
on 2.0.32 and verified this. I don't believe the code has
significantly changed since then.
The problem is update_vm_cache only looks currently for the primary
inode page. The one at (offset%PAGE_SIZE)==0. So the other page at
offset%PAGE_SIZE==1k is not updated.
BL> This would be the appropriate thing to do if you'd like see such exotic
BL> behaviour ;-)
I guess everyone seems to like this :)
Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fixing private mappings
1998-04-23 22:03 ` Eric W. Biederman
@ 1998-04-24 20:37 ` Stephen C. Tweedie
1998-04-25 5:30 ` Eric W. Biederman
0 siblings, 1 reply; 6+ messages in thread
From: Stephen C. Tweedie @ 1998-04-24 20:37 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Benjamin C.R. LaHaise, linux-mm
Hi,
On 23 Apr 1998 17:03:02 -0500, ebiederm+eric@npwt.net (Eric
W. Biederman) said:
>>> Definition of Private Mappings:
>>> A private mapping is a copy-on-write mapping of a file.
>>>
>>> That is if the file is written to after the mapping is established,
>>> the contents of the mapping will always remain what the contents of
>>> the file was at the time of the private mapping.
BL> Note: 'the initial write reference will create a private copy' -- not
BL> the act of reading or mapping.
> Right. That is probably the only reasonable way to implement it.
Indeed.
> I stated it as I did so what happens if another process writes to the
> file is clear. Another process writing to the file will be the
> `initial write reference'.
No --- in the context of a MAP_PRIVATE mapping, only in-memory writes to
the privately mapped virtual address space count as write references.
> So logically MAP_PRIVATE gives you a snapshot of the contents of a
> file. Not that it actually takes that snapshot...
No, it shouldn't --- it maps the file into the process address space,
and all updates to the file are reflected in the process's virtual
memory copy. Only if the process tries to write to the file is the COW
activated.
> Possibly I'm failing to see the difference in the definitions?
Yep. MAP_PRIVATE mappings preserve the correspondance over writes to
the file by any mechanism other than modifying the mapping itself.
Note that the semantics are relaxed a bit if we have non-page-aligned
private maps, in that the correspondance between the mapped image and
the file contents is no longer always preserved if the file is updated.
> The problem is update_vm_cache only looks currently for the primary
> inode page. The one at (offset%PAGE_SIZE)==0. So the other page at
> offset%PAGE_SIZE==1k is not updated.
Yep, but we are not required to support non-page-aligned maps at all, so
hacking it for special read-only cases is no big deal. Doing a search
for all overlapping mapped pages would be far too slow.
--Stephen
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fixing private mappings
1998-04-24 20:37 ` Stephen C. Tweedie
@ 1998-04-25 5:30 ` Eric W. Biederman
0 siblings, 0 replies; 6+ messages in thread
From: Eric W. Biederman @ 1998-04-25 5:30 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: linux-mm
>>>>> "ST" == Stephen C Tweedie <sct@dcs.ed.ac.uk> writes:
ST> Hi,
ST> On 23 Apr 1998 17:03:02 -0500, ebiederm+eric@npwt.net (Eric
ST> W. Biederman) said:
ST> No --- in the context of a MAP_PRIVATE mapping, only in-memory writes to
ST> the privately mapped virtual address space count as write references.
Got it.
I still like the semantics I defined, but if they aren't defined as
map_private I won't worry about it for the present.
Sometime it might be worth it/fun implementing a MAP_SNAPSHOT, but I
won't worry about that for the present.
ST> Yep, but we are not required to support non-page-aligned maps at all, so
ST> hacking it for special read-only cases is no big deal. Doing a search
ST> for all overlapping mapped pages would be far too slow.
I think in the general case I could implement it without overhead and
in the common a.out case within a factor of 2, and in the worst case
within a factor of 4 (assuming a restriction of 1k alignment). And
this is primarly memcpy cost there should be no need for extra disk
i/o.
The scheme I'm playing with using will share the same case as extra
huge file I/O (> 16TB), and in the common case should perform
identically to what we have now.
Thanks for setting me straight. It hadn't been my intention to play
with mmap until I found this really weird use of that mmap makes of
the page_cache, so I really wasn't prepared for that one.
Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~1998-04-25 5:33 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-04-23 5:06 Fixing private mappings Eric W. Biederman
1998-04-23 8:28 ` Rik van Riel
1998-04-23 15:12 ` Benjamin C.R. LaHaise
1998-04-23 22:03 ` Eric W. Biederman
1998-04-24 20:37 ` Stephen C. Tweedie
1998-04-25 5:30 ` Eric W. Biederman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox