linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Dillon <dillon@apollo.backplane.com>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: Rik van Riel <riel@conectiva.com.br>,
	Chris Wedgwood <cw@f00f.org>,
	linux-mm@kvack.org, linux-kernel@vger.rutgers.edu
Subject: Re: RFC: design for new VM
Date: Fri, 4 Aug 2000 18:52:16 -0700 (PDT)	[thread overview]
Message-ID: <200008050152.SAA89298@apollo.backplane.com> (raw)
In-Reply-To: <Pine.LNX.4.10.10008041655420.11340-100000@penguin.transmeta.com>

:I agree that from a page table standpoint you should be correct. 
:
:I don't think that the other issues are as easily resolved, though.
:Especially with address space ID's on other architectures it can get
:_really_ interesting to do TLB invalidates correctly to other CPU's etc
:(you need to keep track of who shares parts of your page tables etc).
:
:...
:>     mismatch, such as call mprotect(), the shared page table would be split.
:
:Right. But what about the TLB?

    I'm not advocating trying to share TLB entries, that would be 
    a disaster.  I'm contemplating just the physical page table structure.
    e.g. if you mmap() a 1GB file shared (or private read-only) into 300
    independant processes, it should be possible to share all the meta-data
    required to support that mapping except for the TLB entries themselves.
    ASNs shouldn't make a difference... presumably the tags on the TLB
    entries are added on after the metadata lookup.  I'm also not advocating
    attempting to share intermediate 'partial' in-memory TLB caches (hash
    tables or other structures).  Those are typically fixed in size,
    per-cpu, and would not be impacted by scale.

:You have to have some page table locking mechanism for SMP eventually: I
:think you miss some of the problems because the current FreeBSD SMP stuff
:is mostly still "big kernel lock" (outdated info?), and you'll end up
:kicking yourself in a big way when you have the 300 processes sharing the
:same lock for that region..

    If it were a long-held lock I'd worry, but if it's a lock on a pte
    I don't think it can hurt.  After all, even with separate page tables
    if 300 processes fault on the same backing file offset you are going
    to hit a bottleneck with MP locking anyway, just at a deeper level
    (the filesystem rather then the VM system).  The BSDI folks did a lot
    of testing with their fine-grained MP implementation and found that
    putting a global lock around the entire VM system had absolutely no 
    impact on MP performance.

:>     (Linux falls on its face for other reasons, mainly the fact that it
:>     maps all of physical memory into KVM in order to manage it).
:
:Not true any more.. Trying to map 64GB of RAM convinced us otherwise ;)

    Oh, that's cool!  I don't think anyone in FreeBSDland has bothered with
    large-memory (> 4GB) memory configurations, there doesn't seem to be 
    much demand for such a thing on IA32.

:>     I think the loss of MP locking for this situation is outweighed by the
:>     benefit of a huge reduction in page faults -- rather then see 300 
:>     processes each take a page fault on the same page, only the first process
:>     would and the pte would already be in place when the others got to it.
:>     When it comes right down to it, page faults on shared data sets are not
:>     really an issue for MP scaleability.
:
:I think you'll find that there are all these small details that just
:cannot be solved cleanly. Do you want to be stuck with a x86-only
:solution?
:
:That said, I cannot honestly say that I have tried very hard to come up
:with solutions. I just have this feeling that it's a dark ugly hole that I
:wouldn't want to go down..
:
:			Linus

    Well, I don't think this is x86-specific.  Or, that is, I don't think it
    would pollute the machine-independant code.  FreeBSD has virtually no
    notion of 'page tables' outside the i386-specific VM files... it doesn't
    use page tables (or two-level page-like tables... is Linux still using
    those?) to store meta information at all in the higher levels of the
    kernel.  It uses architecture-independant VM objects and vm_map_entry
    structures for that.  Physical page tables on FreeBSD are 
    throw-away-at-any-time entities.  The actual implementation of the
    'page table' in the IA32 sense occurs entirely in the machine-dependant
    subdirectory for IA32.  

    A page-table sharing mechanism would have to implement the knowledge --
    the 'potential' for sharing at a higher level (the vm_map_entry 
    structure), but it would be up to the machine-dependant VM code to
    implement any actual sharing given that knowledge.  So while the specific
    implementation for IA32 is definitely machine-specific, it would have
    no effect on other OS ports (of course, we have only one other
    working port at the moment, to the alpha, but you get the idea).

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

  reply	other threads:[~2000-08-05  1:52 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-08-02 22:08 Rik van Riel
2000-08-03  7:19 ` Chris Wedgwood
2000-08-03 16:01   ` Rik van Riel
2000-08-04 15:41     ` Matthew Dillon
2000-08-04 17:49       ` Linus Torvalds
2000-08-04 23:51         ` Matthew Dillon
2000-08-05  0:03           ` Linus Torvalds
2000-08-05  1:52             ` Matthew Dillon [this message]
2000-08-05  1:09               ` Matthew Wilcox
2000-08-05  2:05               ` Linus Torvalds
2000-08-05  2:17               ` Alexander Viro
2000-08-07 17:55                 ` Matthew Dillon
2000-08-05 22:48     ` Theodore Y. Ts'o
2000-08-03 18:27   ` lamont
2000-08-03 18:34     ` Linus Torvalds
2000-08-03 19:11       ` Chris Wedgwood
2000-08-03 21:04         ` Benjamin C.R. LaHaise
2000-08-03 19:32       ` Rik van Riel
2000-08-03 18:05 ` Linus Torvalds
2000-08-03 18:50   ` Rik van Riel
2000-08-03 20:22     ` Linus Torvalds
2000-08-03 22:05       ` Rik van Riel
2000-08-03 22:19         ` Linus Torvalds
2000-08-03 19:00   ` Richard B. Johnson
2000-08-03 19:29     ` Rik van Riel
2000-08-03 20:23     ` Linus Torvalds
2000-08-03 19:37   ` Ingo Oeser
2000-08-03 20:40     ` Linus Torvalds
2000-08-03 21:56       ` Ingo Oeser
2000-08-03 22:12         ` Linus Torvalds
2000-08-04  2:33   ` David Gould
2000-08-16 15:10   ` Stephen C. Tweedie
2000-08-03 19:26 ` Roger Larsson
2000-08-03 21:50   ` Rik van Riel
2000-08-03 22:28     ` Roger Larsson
2000-08-04 13:52 Mark_H_Johnson
     [not found] <8725692F.0079E22B.00@d53mta03h.boulder.ibm.com>
2000-08-07 17:40 ` Gerrit.Huizenga
2000-08-07 18:37   ` Matthew Wilcox
2000-08-07 20:55   ` Chuck Lever
2000-08-07 21:59     ` Rik van Riel
2000-08-08  3:26   ` David Gould
2000-08-08  5:54     ` Kanoj Sarcar
2000-08-08  7:15       ` David Gould
     [not found] <87256934.0072FA16.00@d53mta04h.boulder.ibm.com>
2000-08-08  0:36 ` Gerrit.Huizenga
     [not found] <87256934.0078DADB.00@d53mta03h.boulder.ibm.com>
2000-08-08  0:48 ` Gerrit.Huizenga
2000-08-08 15:21   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200008050152.SAA89298@apollo.backplane.com \
    --to=dillon@apollo.backplane.com \
    --cc=cw@f00f.org \
    --cc=linux-kernel@vger.rutgers.edu \
    --cc=linux-mm@kvack.org \
    --cc=riel@conectiva.com.br \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox