From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 4 Aug 2000 17:03:43 -0700 (PDT) From: Linus Torvalds Subject: Re: RFC: design for new VM In-Reply-To: <200008042351.QAA89101@apollo.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Matthew Dillon Cc: Rik van Riel , Chris Wedgwood , linux-mm@kvack.org, linux-kernel@vger.rutgers.edu List-ID: On Fri, 4 Aug 2000, Matthew Dillon wrote: > : > :There are architecture-specific special cases, of course. On ia64, the > :.. > > I spent a weekend a few months ago trying to implement page table > sharing in FreeBSD -- and gave up, but it left me with the feeling > that it should be possible to do without polluting the general VM > architecture. > > For IA32, what it comes down to is that the page table generated by > any segment-aligned mmap() (segment == 4MB) made by two processes > should be shareable, simply be sharing the page directory entry (and thus > the physical page representing 4MB worth of mappings). This would be > restricted to MAP_SHARED mappings with the same protections, but the two > processes would not have to map the segments at the same VM address, they > need only be segment-aligned. I agree that from a page table standpoint you should be correct. I don't think that the other issues are as easily resolved, though. Especially with address space ID's on other architectures it can get _really_ interesting to do TLB invalidates correctly to other CPU's etc (you need to keep track of who shares parts of your page tables etc). > This would be a transparent optimization wholely invisible to the process, > something that would be optionally implemented in the machine-dependant > part of the VM code (with general support in the machine-independant > part for the concept). If the process did anything to create a mapping > mismatch, such as call mprotect(), the shared page table would be split. Right. But what about the TLB? It's not a problem on the x86, because the x86 doesn't have ASN's anyway. But fo rit to be a valid notion, I feel that it should be able to be portable too. You have to have some page table locking mechanism for SMP eventually: I think you miss some of the problems because the current FreeBSD SMP stuff is mostly still "big kernel lock" (outdated info?), and you'll end up kicking yourself in a big way when you have the 300 processes sharing the same lock for that region.. (Not that I think you'd necessarily have much contention on the lock - the problem tends to be more in the logistics of keeping track of the locks of partial VM regions etc). > (Linux falls on its face for other reasons, mainly the fact that it > maps all of physical memory into KVM in order to manage it). Not true any more.. Trying to map 64GB of RAM convinced us otherwise ;) > I think the loss of MP locking for this situation is outweighed by the > benefit of a huge reduction in page faults -- rather then see 300 > processes each take a page fault on the same page, only the first process > would and the pte would already be in place when the others got to it. > When it comes right down to it, page faults on shared data sets are not > really an issue for MP scaleability. I think you'll find that there are all these small details that just cannot be solved cleanly. Do you want to be stuck with a x86-only solution? That said, I cannot honestly say that I have tried very hard to come up with solutions. I just have this feeling that it's a dark ugly hole that I wouldn't want to go down.. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/