From: Michel Lespinasse <walken@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, Hugh Dickins <hughd@google.com>,
Rik van Riel <riel@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Nick Piggin <npiggin@kernel.dk>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 1/6] mlock: only hold mmap_sem in shared mode when faulting in pages
Date: Wed, 8 Dec 2010 15:58:21 -0800 [thread overview]
Message-ID: <AANLkTikYZi0=c+yM1p8H18u+9WVbsQXjAinUWyNt7x+t@mail.gmail.com> (raw)
In-Reply-To: <20101208152740.ac449c3d.akpm@linux-foundation.org>
On Wed, Dec 8, 2010 at 3:27 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
>> Currently mlock() holds mmap_sem in exclusive mode while the pages get
>> faulted in. In the case of a large mlock, this can potentially take a
>> very long time, during which various commands such as 'ps auxw' will
>> block. This makes sysadmins unhappy:
>>
>> real 14m36.232s
>> user 0m0.003s
>> sys 0m0.015s
>>(output from 'time ps auxw' while a 20GB file was being mlocked without
>> being previously preloaded into page cache)
>
> The kernel holds down_write(mmap_sem) for 14m36s?
Yes...
[... patch snipped off ...]
> Am I correct in believing that we'll still hold down_read(mmap_sem) for
> a quarter hour?
Yes, patch 1/6 changes the long hold time to be in read mode instead
of write mode, which is only a band-aid. But, this prepares for patch
5/6, which releases mmap_sem whenever there is contention on it or
when blocking on disk reads.
> We don't need to hold mmap_sem at all while faulting in those pages,
> do we? We could just do
>
> for (addr = start, addr < end; addr += PAGE_SIZE)
> get_user(x, addr);
>
> and voila. If the pages are in cache and the ptes are set up then that
> will be *vastly* faster than the proposed code. If the get_user()
> takes a minor fault then it'll be slower. If it's a major fault then
> the difference probably doesn't matter much.
get_user wouldn't suffice if the page is already mapped in, as we need
to mark it as PageMlocked. Also, we need to skip IO and PFNMAP
regions. I don't think you can make things much simpler than what I
ended up with.
> But whatever. Is this patchset a half-fix, and should we rather be
> looking for a full-fix?
I think the series fully fixes the mlock() and mlockall() cases, which
has been the more pressing use case for us.
Even then, there are still cases where we could still observe long
mmap_sem hold times - fundamentally, every place that calls
get_user_pages (or do_mmap, in the mlockall MCL_FUTURE case) with a
large page range may create such problems. From the looks of it, most
of these places wouldn't actually care if the mmap_sem got dropped in
the middle of the operation, but a general fix will have to involve
looking at all the call sites to be sure.
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-12-08 23:58 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-03 0:16 [PATCH 0/6] mlock: do not hold mmap_sem for extended periods of time Michel Lespinasse
2010-12-03 0:16 ` [PATCH 1/6] mlock: only hold mmap_sem in shared mode when faulting in pages Michel Lespinasse
2010-12-08 23:27 ` Andrew Morton
2010-12-08 23:58 ` Michel Lespinasse [this message]
2010-12-10 6:11 ` Linus Torvalds
2010-12-10 6:39 ` Michel Lespinasse
2010-12-10 11:12 ` Peter Zijlstra
2010-12-14 0:51 ` Michel Lespinasse
2010-12-14 1:05 ` Andrew Morton
2010-12-14 1:26 ` Michel Lespinasse
2010-12-14 15:43 ` Linus Torvalds
2010-12-14 23:22 ` Michel Lespinasse
2010-12-03 0:16 ` [PATCH 2/6] mm: add FOLL_MLOCK follow_page flag Michel Lespinasse
2010-12-04 6:55 ` Michel Lespinasse
2010-12-03 0:16 ` [PATCH 3/6] mm: move VM_LOCKED check to __mlock_vma_pages_range() Michel Lespinasse
2010-12-03 0:16 ` [PATCH 4/6] rwsem: implement rwsem_is_contended() Michel Lespinasse
2010-12-03 0:16 ` [PATCH 5/6] mlock: do not hold mmap_sem for extended periods of time Michel Lespinasse
2010-12-08 23:42 ` Andrew Morton
2010-12-03 0:16 ` [PATCH 6/6] x86 rwsem: more precise rwsem_is_contended() implementation Michel Lespinasse
2010-12-03 22:41 ` Peter Zijlstra
2010-12-03 22:51 ` Michel Lespinasse
2010-12-03 23:02 ` [PATCH 0/6] mlock: do not hold mmap_sem for extended periods of time Michel Lespinasse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='AANLkTikYZi0=c+yM1p8H18u+9WVbsQXjAinUWyNt7x+t@mail.gmail.com' \
--to=walken@google.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@kernel.dk \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox