From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from flinx.npwt.net (eric@flinx.npwt.net [208.236.161.237]) by kvack.org (8.8.7/8.8.7) with ESMTP id AAA13132 for ; Thu, 23 Apr 1998 00:54:48 -0400 Subject: Fixing private mappings From: ebiederm+eric@npwt.net (Eric W. Biederman) Date: 23 Apr 1998 00:06:31 -0500 Message-ID: Sender: owner-linux-mm@kvack.org To: linux-mm@kvack.org List-ID: Please excuse me for thinking out loud but private mappings seems to be a hard problem that has not been correctly implemented in the linux kernel. Definition of Private Mappings: A private mapping is a copy-on-write mapping of a file. That is if the file is written to after the mapping is established, the contents of the mapping will always remain what the contents of the file was at the time of the private mapping. Further if another private mapping is established after one private mapping has been established it should have the file contents of the file at the time the mapping is established. Not at the time any previous private mapping was established. A few ideas occur to me for specific problems, but the whole problem is a challenge. What I do know is that we need some kind of write barrier that we check to see if we have made a copy of a page for any private mappings that may exist before we write to it. How should we find those private mappings? Wait. That would be follow inode->i_mmap whenver we read in a page. And then have code in generic_file_write, and update_vm_cache, to make sure the copies are made at the appropriate times. How should we maximize sharing of private mappings? The simplest solution would be to continue with the current solution, and just restrict mappings 512 byte boundaries. A slightly more generic solution would be to introduce a new ``inode'' that new it was a copy of the old inode but at a different offset. If these new ``inodes'' would then have a linked list of their own, that could be followed for update purposes. -- Extra inodes for files could also be extended to allow an offset at say 4TB or so into a file so that we can handle any sized file. Though obviously you can't cache it all at once, but you could cache any piece ;) There is a possibility there for per-inode metadata too but I'm not certain about that one. I think since my initial goal was large file support with the common case on intel being restricted to 32bit integers, I'll play with the extra inodes approach. It will probably be smart to restrict ourselves to still only allowing mappings on fs block boundaries. There are some efficiency gained there (on reading pages that are totally not in memory in) but otherwise we should be fine. Eric