From: Andrew Morton <akpm@zip.com.au>
To: Andrea Arcangeli <andrea@suse.de>
Cc: Linus Torvalds <torvalds@transmeta.com>,
"Martin J. Bligh" <fletch@aracnet.com>,
Rik van Riel <riel@conectiva.com.br>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: scalable kmap (was Re: vm lock contention reduction)
Date: Mon, 08 Jul 2002 13:39:04 -0700 [thread overview]
Message-ID: <3D29F868.1338ACF3@zip.com.au> (raw)
In-Reply-To: <20020708080953.GC1350@dualathlon.random>
Andrea Arcangeli wrote:
>
> ...
> > generic_file_write()
> > {
> > ...
> > atomic_inc(¤t->mm->dont_unmap_pages);
> >
> > {
> > volatile char dummy;
> > __get_user(dummy, addr);
> > __get_user(dummy, addr+bytes+1);
> > }
> > lock_page();
> > ->prepare_write()
> > kmap_atomic()
> > copy_from_user()
> > kunmap_atomic()
> > ->commit_write()
> > atomic_dec(¤t->mm->dont_unmap_pages);
> > unlock_page()
> > }
> >
> > and over in mm/rmap.c:try_to_unmap_one(), check mm->dont_unmap_pages.
> >
> > Obviously, all this is dependent on CONFIG_HIGHMEM.
> >
> > Workable?
>
> the above pseudocode still won't work correctly,
Sure. It's crap. It can be used to get mlockall() for free.
> if you don't pin the
> page as Martin proposed and you only rely on its virtual mapping to stay
> there because the page can go away under you despite the
> swap_out/rmap-unmapping work, if there's a parallel thread running
> munmap+re-mmap under you. So at the very least you need the mmap_sem at
> every generic_file_write to avoid other threads to change your virtual
> address under you. And you'll basically need to make the mmap_sem
> recursive, because you have to take it before running __get_user to
> avoid races. You could easily do that using my rwsem, I made two versions
> of them, with one that supports recursion, however this is just for your
> info, I'm not suggesting to make it recursive.
I think I'll just go for pinning the damn page. It's a spinlock and
maybe three cachelines but the kernel is about to do a 4k memcpy
anyway. And get_user_pages() doesn't show up much on O_DIRECT
profiles and it'll be a net win and we need to do SOMETHING, dammit.
> ...
> The only reason I can imagine rmap useful in todays
> hardware for all kind of vma (what the patch provides compared to what
> we have now) is to more efficiently defragment ram with an algorithm in
> the memory balancing to provide largepages more efficiently from mixed
> zones, if somebody would suggest rmap for this reason (nobody did yet)
It has been discussed. But no action yet.
> I
> would have to agree completely that it is very useful for that, OTOH it
> seems everybody is reserving (or planning to reserve) a zone for
> largepages anyways so that we don't run into fragmentation in the first
> place. And btw - talking about largepages - we have three concurrent and
> controversial largepage implementations for linux available today, they
> all have different API, one is even shipped in production by a vendor,
What implementation do you favour?
> and while auditing the code I seen it also exports an API visible to
> userspace [ignoring the sysctl] (unlike what I was told):
>
> +#define MAP_BIGPAGE 0x40 /* bigpage mapping */
> [..]
> _trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN) |
> _trans(flags, MAP_DENYWRITE, VM_DENYWRITE) |
> + _trans(flags, MAP_BIGPAGE, VM_BIGMAP) |
> _trans(flags, MAP_EXECUTABLE, VM_EXECUTABLE);
> return prot_bits | flag_bits;
> #undef _trans
>
> that's a new unofficial bitflag to mmap that any proprietary userspace
> can pass to mmap today. Other implementations of the largepage feature
> use madvise or other syscalls to tell the kernel to allocate
> largepages. At least the above won't return -EINVAL so the binaryonly
> app will work transparently on a mainline kernel, but it can eventually
> malfunction if we use 0x40 for something else in 2.5. So I think we
> should do something about the largepages too ASAP into 2.5 (like
> async-io).
Yup. I don't think the -aa kernel has a large page patch, does it?
Is that something which you have time to look into?
-
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next prev parent reply other threads:[~2002-07-08 20:39 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-07-04 23:05 vm lock contention reduction Andrew Morton
2002-07-04 23:26 ` Rik van Riel
2002-07-04 23:27 ` Rik van Riel
2002-07-05 1:37 ` Andrew Morton
2002-07-05 1:49 ` Rik van Riel
2002-07-05 2:18 ` Andrew Morton
2002-07-05 2:16 ` Rik van Riel
2002-07-05 2:53 ` Andrew Morton
2002-07-05 3:52 ` Benjamin LaHaise
2002-07-05 4:47 ` Linus Torvalds
2002-07-05 5:38 ` Andrew Morton
2002-07-05 5:51 ` Linus Torvalds
2002-07-05 6:08 ` Linus Torvalds
2002-07-05 6:27 ` Alexander Viro
2002-07-05 6:33 ` Andrew Morton
2002-07-05 7:33 ` Andrea Arcangeli
2002-07-07 2:50 ` Andrew Morton
2002-07-07 3:05 ` Linus Torvalds
2002-07-07 3:47 ` Andrew Morton
2002-07-08 11:39 ` Enhanced profiling support (was Re: vm lock contention reduction) John Levon
2002-07-08 17:52 ` Linus Torvalds
2002-07-08 18:41 ` Karim Yaghmour
2002-07-10 2:22 ` John Levon
2002-07-10 4:16 ` Karim Yaghmour
2002-07-10 4:38 ` John Levon
2002-07-10 5:46 ` Karim Yaghmour
2002-07-10 13:10 ` bob
2002-07-07 5:16 ` vm lock contention reduction Martin J. Bligh
2002-07-07 6:13 ` scalable kmap (was Re: vm lock contention reduction) Martin J. Bligh
2002-07-07 6:37 ` Andrew Morton
2002-07-07 7:53 ` Linus Torvalds
2002-07-07 9:04 ` Andrew Morton
2002-07-07 16:13 ` Martin J. Bligh
2002-07-07 18:31 ` Linus Torvalds
2002-07-07 18:55 ` Linus Torvalds
2002-07-07 19:02 ` Linus Torvalds
2002-07-08 7:24 ` Andrew Morton
2002-07-08 8:09 ` Andrea Arcangeli
2002-07-08 14:50 ` William Lee Irwin III
2002-07-08 20:39 ` Andrew Morton [this message]
2002-07-08 21:08 ` Benjamin LaHaise
2002-07-08 21:45 ` Andrew Morton
2002-07-08 22:24 ` Benjamin LaHaise
2002-07-07 16:00 ` Martin J. Bligh
2002-07-07 18:28 ` Linus Torvalds
2002-07-08 7:11 ` Andrea Arcangeli
2002-07-08 10:15 ` Eric W. Biederman
2002-07-08 7:00 ` Andrea Arcangeli
2002-07-08 17:29 ` Martin J. Bligh
2002-07-08 22:14 ` Linus Torvalds
2002-07-09 0:16 ` Andrew Morton
2002-07-09 3:17 ` Andrew Morton
2002-07-09 4:28 ` Martin J. Bligh
2002-07-09 5:28 ` Andrew Morton
2002-07-09 6:15 ` Martin J. Bligh
2002-07-09 6:30 ` William Lee Irwin III
2002-07-09 6:32 ` William Lee Irwin III
2002-07-09 16:08 ` Martin J. Bligh
2002-07-09 17:32 ` Andrea Arcangeli
2002-07-10 5:32 ` Andrew Morton
2002-07-10 22:43 ` Martin J. Bligh
2002-07-10 23:08 ` Andrew Morton
2002-07-10 23:26 ` Martin J. Bligh
2002-07-11 0:19 ` Andrew Morton
2002-07-12 17:48 ` Martin J. Bligh
2002-07-13 11:18 ` Andrea Arcangeli
2002-07-09 13:59 ` Benjamin LaHaise
2002-07-08 0:38 ` vm lock contention reduction William Lee Irwin III
2002-07-05 6:46 ` Andrew Morton
2002-07-05 14:25 ` Rik van Riel
2002-07-05 23:11 ` William Lee Irwin III
2002-07-05 23:48 ` Andrew Morton
2002-07-06 0:11 ` Rik van Riel
2002-07-06 0:31 ` Linus Torvalds
2002-07-06 0:45 ` Rik van Riel
2002-07-06 0:48 ` Andrew Morton
2002-07-08 0:59 ` William Lee Irwin III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3D29F868.1338ACF3@zip.com.au \
--to=akpm@zip.com.au \
--cc=andrea@suse.de \
--cc=fletch@aracnet.com \
--cc=linux-mm@kvack.org \
--cc=riel@conectiva.com.br \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox