linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michel Lespinasse <walken@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Nick Piggin <npiggin@kernel.dk>, Rik van Riel <riel@redhat.com>
Subject: Re: [RFC] mlock: release mmap_sem every 256 faulted pages
Date: Tue, 23 Nov 2010 12:55:01 -0800	[thread overview]
Message-ID: <AANLkTi=dK9wQaHm=tXOCqN2BDw5jEtH5qfs9zRHbE0qT@mail.gmail.com> (raw)
In-Reply-To: <20101122215746.e847742d.akpm@linux-foundation.org>

On Mon, Nov 22, 2010 at 9:57 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 22 Nov 2010 21:00:52 -0800 Michel Lespinasse <walken@google.com> wrote:
>> I'd like to sollicit comments on this proposal:
>>
>> Currently mlock() holds mmap_sem in exclusive mode while the pages get
>> faulted in. In the case of a large mlock, this can potentially take a
>> very long time.
>
> A more compelling description of why this problem needs addressing
> would help things along.

Oh my. It's probably not too useful for desktops, where such large
mlocks are hopefully uncommon.

At google we have many applications that serve data from memory and
don't want to allow for disk latencies. Some of the simpler ones use
mlock (though there are other ways - anon memory running with swap
disabled is a surprisingly popular choice).

Kosaki is also showing interest in mlock, though I'm not sure what his
use case is.

Due to the large scope of mmap_sem, there are many things that may
block while mlock() runs. If there are other threads running (and most
of our programs are threaded from an early point in their execution),
the threads might block on a page fault that needs to acquire
mmap_sem. Also, various files such as /proc/pid/maps stop working.
This is a problem for us because our cluster software can't monitor
what's going on with that process - not by talking to it as the
required threads might block, nor by looking at it through /proc
files.

A separate, personal interest is that I'm still carrying the
(admittedly poor-taste) down_read_unfair() patches internally, and I
would be able to drop them if only long mmap_sem hold times could be
eliminated.

>> +             /*
>> +              * Limit batch size to 256 pages in order to reduce
>> +              * mmap_sem hold time.
>> +              */
>> +             nfault = nstart + 256 * PAGE_SIZE;
>
> It would be nicer if there was an rwsem API to ask if anyone is
> currently blocked in down_read() or down_write().  That wouldn't be too
> hard to do.  It wouldn't detect people polling down_read_trylock() or
> down_write_trylock() though.

I can do that. I actually thought about it myself, but then dismissed
it as too fancy for version 1. Only problem is that this would go into
per-architecture files which I can't test. But I wouldn't have to
actually write asm, so this may be OK. down_read_trylock() is no
problem, as these calls will succeed unless there is a queued writer,
which we can easily detect. down_write_trylock() is seldom used, the
only caller I could find for mmap_sem is
drivers/infiniband/core/umem.c and it'll do a regular down_write()
soon enough if the initial try fails.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-11-23 20:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-23  5:00 Michel Lespinasse
2010-11-23  5:57 ` Andrew Morton
2010-11-23  7:49   ` KOSAKI Motohiro
2010-11-23 20:55   ` Michel Lespinasse [this message]
2010-11-23 23:57     ` KOSAKI Motohiro
2010-12-01  0:14 ` KOSAKI Motohiro
2010-12-01  4:42 ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTi=dK9wQaHm=tXOCqN2BDw5jEtH5qfs9zRHbE0qT@mail.gmail.com' \
    --to=walken@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@kernel.dk \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox