reverse pte lookups and anonymous private mappings; avl trees?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Benjamin C.R. LaHaise" <blah@kvack.org>
To: linux-mm@kvack.org
Cc: torvalds@transmeta.com
Subject: reverse pte lookups and anonymous private mappings; avl trees?
Date: Mon, 2 Mar 1998 14:04:01 -0500 (U\x01)	[thread overview]
Message-ID: <Pine.LNX.3.95.980302131917.32327E-100000@as200.spellcast.com> (raw)

Hello,

Okay, I've been ripping my hair out this weekend trying to get my reverse
pte lookups working using the inode/vma walking scheme, and I'm about to
go mad because of it.  Here are the alternatives I've come up with. 
Please give me some comments as to which one people think is the best from
a design standpoint (hence the cc to Linus) and performance wise, too. 
Note that I'm ignoring the problem of overlapping inode/offset pairs given
the new swap cache code.  (this is probably riddled with typos/thinkos)

	a)  add a struct vm_area_struct pointer and vm_offset to struct
page.  To each vm_area_struct, add a pair of pointers so that there's a
list of vm_area_structs which are private, but can be sharing pages.
Hence we end up with something like:

	page--> vma -> vm_next_private -> ... (cicular list)
	         |          |
	       vm_mm      vm_mm
	         |         ...
	        ... (use page->vm_offset - vma->vm_offset + vma->vm_start)
	         |
	        pte

This, I think, is the cleanest approach.  It makes life easy for finding a
shared anonymous pages, but has the downside of adding 8 bytes onto the
page map (16 on the 64 bit machines), same to the vm_area_struct.  Perhaps
the vma's vm_next_share/vm_pprev_share pointers could be reused (although
the seperate private mapping share list would make searching for
non-anonymous, but private mappings faster).

	b) per-vma inodes.  This scheme is a headache.  It would involve
linking the inodes in order to maintain a chain of the anonymous pages.
At fork() time, each private anonymous vma would need to have two new
inodes allocated (one for each task), which would point to the old inode.
Ptes are found using the inode, offset pair already in struct page, plus
walking the inode tree.  Has the advantage that we can use the inode,
offset pair already in struct page, no need to grow struct vm_area_struct;
disadvantage: hairy, conflicts with the new swap cache code - requires
the old swap_entry to return to struct page.

	c) per-mm inodes, shared on fork.  Again, this one is a bit
painful, although less so than b.  This one requires that we allow
multiple pages in the page cache exist with the same offset (ugly).  Each
anonymous page gets the mm_struct's inode attached to it, with the virtual
address within the process' memory space being the offset.  The ptes are
found in the same manner as for normal shared pages (inode->i_mmap->vmas).
Aliased page-cache entries are created on COW and mremap().

My thoughts are that the only real options are (a) and (c).  (a) seems to
be the cleanest conceptually, while (c) has very little overhead, but,
again, conflicts with the new swap cache...

On another note, is there any particular reason why the AVL tree for vma's
was removed in 2.1?  Because of recent changes to use the struct file * in
the vma, vma's aren't going to be coalesced as much, and some of the
private/anon changes I'm suggesting could contribute to that even further. 
I seem to remember someone suggesting the use of red-black trees as an
alternative, and methinks a friend has some code we can borrow. Just as a
note, with the introduction of PAM, quite a few daemons have ~30 vma's. 
If each time we want to steal a page we have to do a 30 element list walk,
the complexity of the swapper remains a bit high in my opinion (as kswapd
is now).

		-ben

next             reply	other threads:[~1998-03-02 19:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1998-03-02 19:04 Benjamin C.R. LaHaise [this message]
1998-03-02 23:03 ` Stephen C. Tweedie
1998-03-02 23:37   ` Rik van Riel
1998-03-03  6:29   ` Benjamin C.R. LaHaise
1998-03-03 23:49     ` Stephen C. Tweedie
1998-03-04 21:26     ` Stephen C. Tweedie
1998-03-04 23:20       ` Rik van Riel
1998-03-05  4:21         ` Benjamin C.R. LaHaise
1998-03-05 22:25         ` Stephen C. Tweedie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.3.95.980302131917.32327E-100000@as200.spellcast.com \
    --to=blah@kvack.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox