From: Matthew Dillon <dillon@apollo.backplane.com>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: Rik van Riel <riel@conectiva.com.br>,
Chris Wedgwood <cw@f00f.org>,
linux-mm@kvack.org, linux-kernel@vger.rutgers.edu
Subject: Re: RFC: design for new VM
Date: Fri, 4 Aug 2000 16:51:54 -0700 (PDT) [thread overview]
Message-ID: <200008042351.QAA89101@apollo.backplane.com> (raw)
In-Reply-To: <Pine.LNX.4.10.10008041033230.813-100000@penguin.transmeta.com>
:> (fork) deal. Physical segment sharing outside of clone is something
:> Linux could use to, I don't think it does it either. It's not easy to
:> do right.
:
:It's probably impossible to do right. Basically, if you do it, you do it
:wrong.
:
:As far as I can tell, you basically screw yourself on the TLB and locking
:if you ever try to implement this. And frankly I don't see how you could
:avoid getting screwed.
:
:There are architecture-specific special cases, of course. On ia64, the
:..
I spent a weekend a few months ago trying to implement page table
sharing in FreeBSD -- and gave up, but it left me with the feeling
that it should be possible to do without polluting the general VM
architecture.
For IA32, what it comes down to is that the page table generated by
any segment-aligned mmap() (segment == 4MB) made by two processes
should be shareable, simply be sharing the page directory entry (and thus
the physical page representing 4MB worth of mappings). This would be
restricted to MAP_SHARED mappings with the same protections, but the two
processes would not have to map the segments at the same VM address, they
need only be segment-aligned.
This would be a transparent optimization wholely invisible to the process,
something that would be optionally implemented in the machine-dependant
part of the VM code (with general support in the machine-independant
part for the concept). If the process did anything to create a mapping
mismatch, such as call mprotect(), the shared page table would be split.
The problem being solved for FreeBSD is actually quite serious -- due to
FreeBSD's tracking of individual page table entries, being able to share
a page table would radically reduce the amount of tracking information
required for any large shared areas (shared libraries, large shared file
mappings, large sysv shared memory mappings). For linux the problem is
relatively minor - linux would save considerable page table memory.
Linux is still reasonably scaleable without the optimization while
FreeBSD currently falls on its face for truely huge shared mappings
(e.g. 300 processes all mapping a shared 1GB memory area, aka Oracle 8i).
(Linux falls on its face for other reasons, mainly the fact that it
maps all of physical memory into KVM in order to manage it).
I think the loss of MP locking for this situation is outweighed by the
benefit of a huge reduction in page faults -- rather then see 300
processes each take a page fault on the same page, only the first process
would and the pte would already be in place when the others got to it.
When it comes right down to it, page faults on shared data sets are not
really an issue for MP scaleability.
In anycase, this is a 'dream' for me for FreeBSD right now. It's a very
difficult problem to solve.
-Matt
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
next prev parent reply other threads:[~2000-08-04 23:51 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2000-08-02 22:08 Rik van Riel
2000-08-03 7:19 ` Chris Wedgwood
2000-08-03 16:01 ` Rik van Riel
2000-08-04 15:41 ` Matthew Dillon
2000-08-04 17:49 ` Linus Torvalds
2000-08-04 23:51 ` Matthew Dillon [this message]
2000-08-05 0:03 ` Linus Torvalds
2000-08-05 1:52 ` Matthew Dillon
2000-08-05 1:09 ` Matthew Wilcox
2000-08-05 2:05 ` Linus Torvalds
2000-08-05 2:17 ` Alexander Viro
2000-08-07 17:55 ` Matthew Dillon
2000-08-05 22:48 ` Theodore Y. Ts'o
2000-08-03 18:27 ` lamont
2000-08-03 18:34 ` Linus Torvalds
2000-08-03 19:11 ` Chris Wedgwood
2000-08-03 21:04 ` Benjamin C.R. LaHaise
2000-08-03 19:32 ` Rik van Riel
2000-08-03 18:05 ` Linus Torvalds
2000-08-03 18:50 ` Rik van Riel
2000-08-03 20:22 ` Linus Torvalds
2000-08-03 22:05 ` Rik van Riel
2000-08-03 22:19 ` Linus Torvalds
2000-08-03 19:00 ` Richard B. Johnson
2000-08-03 19:29 ` Rik van Riel
2000-08-03 20:23 ` Linus Torvalds
2000-08-03 19:37 ` Ingo Oeser
2000-08-03 20:40 ` Linus Torvalds
2000-08-03 21:56 ` Ingo Oeser
2000-08-03 22:12 ` Linus Torvalds
2000-08-04 2:33 ` David Gould
2000-08-16 15:10 ` Stephen C. Tweedie
2000-08-03 19:26 ` Roger Larsson
2000-08-03 21:50 ` Rik van Riel
2000-08-03 22:28 ` Roger Larsson
2000-08-04 13:52 Mark_H_Johnson
[not found] <8725692F.0079E22B.00@d53mta03h.boulder.ibm.com>
2000-08-07 17:40 ` Gerrit.Huizenga
2000-08-07 18:37 ` Matthew Wilcox
2000-08-07 20:55 ` Chuck Lever
2000-08-07 21:59 ` Rik van Riel
2000-08-08 3:26 ` David Gould
2000-08-08 5:54 ` Kanoj Sarcar
2000-08-08 7:15 ` David Gould
[not found] <87256934.0072FA16.00@d53mta04h.boulder.ibm.com>
2000-08-08 0:36 ` Gerrit.Huizenga
[not found] <87256934.0078DADB.00@d53mta03h.boulder.ibm.com>
2000-08-08 0:48 ` Gerrit.Huizenga
2000-08-08 15:21 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200008042351.QAA89101@apollo.backplane.com \
--to=dillon@apollo.backplane.com \
--cc=cw@f00f.org \
--cc=linux-kernel@vger.rutgers.edu \
--cc=linux-mm@kvack.org \
--cc=riel@conectiva.com.br \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox