linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-mm@kvack.org,
	Linux Kernel list <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@osdl.org>,
	Mike Waychison <mikew@google.com>
Subject: Re: [RFC] page fault retry with NOPAGE_RETRY
Date: Fri, 15 Sep 2006 00:11:51 -0700	[thread overview]
Message-ID: <20060915001151.75f9a71b.akpm@osdl.org> (raw)
In-Reply-To: <1158274508.14473.88.camel@localhost.localdomain>

On Fri, 15 Sep 2006 08:55:08 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> in mm.h:
> 
>  #define NOPAGE_SIGBUS   (NULL)
>  #define NOPAGE_OOM      ((struct page *) (-1))
> +#define NOPAGE_RETRY	((struct page *) (-2))
> 
> and in memory.c, in do_no_page():
> 
> 
>         /* no page was available -- either SIGBUS or OOM */
>         if (new_page == NOPAGE_SIGBUS)
>                 return VM_FAULT_SIGBUS;
>         if (new_page == NOPAGE_OOM)
>                 return VM_FAULT_OOM;
> +       if (new_page == NOPAGE_RETRY)
> +               return VM_FAULT_MINOR;

Google are using such a patch (Mike owns it).

It is to reduce mmap_sem contention with threaded apps.  If one thread
takes a major pagefault, it will perform a synchronous disk read while
holding down_read(mmap_sem).  This causes any other thread which wishes to
perform any mmap/munmap/mprotect/etc (which does down_write(mmap_sem)) to
block behind that disk IO.  As you can understand, that can be pretty bad
in the right circumstances.

The patch modifies filemap_nopage() to look to see if it needs to block on
the page coming uptodate and if so, it does up_read(mmap_sem);
wait_on_page_locked(); return NOPAGE_RETRY.  That causes the top-level
do_page_fault() code to rerun the entire pagefault.

It hasn't been submitted for upstream yet because

a) To avoid livelock possibilities (page reclaim, looping FADV_DONTNEED,
   etc) it only does the retry a single time.  After that it falls back to
   the traditional synchronous-read-inside-mmap_sem approach.  A flag in
   current->flags is used to detect the second attempt.  It keeps the patch
   simple, but is a bit hacky.

   To resolve this cleanly I'm thinking we change all the pagefault code
   everywhere: instantiate a new `struct pagefault_args' in do_page_fault()
   and pass that all around the place.  So all the pagefault code, all the
   ->nopage handlers etc will take a single argument.

   This will, I hope, result in less code, faster code and less stack
   consumption.  It could also be used for things like the
   lock-the-page-in-filemap_nopage() proposal: the ->nopage()
   implementation could set a boolean in pagefault_args indicating whether
   the page has been locked.

   And, of course, fielmap_nopage could set another boolean in
   pagefault_args to indicate that it has already tried to rerun the
   pagefault once.

b) It could be more efficient.  Most of the time, there's no need to
   back all the way out of the pagefault handler and rerun the whole thing.
   Because most of the time, nobody changed anything in the mm_struct.  We
   _could_ just retake the mmap_sem after the page comes uptodate and, if
   nothing has changed, proceed.  I see two ways of doing this:

   - The simple way: look to see if any other processes are sharing
     this mm_struct.  If not, just do the synchronous read inside mmap_sem.

   - The better way: put a sequence counter in the mm_struct,
     increment that in every place where down_write(mmap_sem) is performed.
      The pagefault code then can re-take the mmap_sem for reading and look
     to see if the sequence counter is unchanged.  If it is, proceed.  If
     it _has_ changed then drop mmap_sem again and return NOPAGE_RETRY.

otoh, maybe using another bit in page->flags is a good compromise ;)

Mike, could you whip that patch out please?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2006-09-15  7:11 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-14 22:55 Benjamin Herrenschmidt
2006-09-15  0:19 ` Linus Torvalds
2006-09-15  7:11 ` Andrew Morton [this message]
2006-09-15  7:35   ` Andrew Morton
2006-09-15 13:30     ` Hugh Dickins
2006-09-16  1:03       ` Benjamin Herrenschmidt
2006-09-19 23:35   ` Mike Waychison
2006-09-19 23:50     ` Benjamin Herrenschmidt
2006-09-19 23:59       ` Andrew Morton
2006-09-20  0:06         ` Benjamin Herrenschmidt
2006-09-20  0:05       ` Benjamin Herrenschmidt
2006-09-20  0:21         ` Andrew Morton
2006-09-20  1:57           ` Benjamin Herrenschmidt
2006-09-20  3:05             ` Andrew Morton
2006-09-20  5:04               ` Benjamin Herrenschmidt
2006-09-20  5:26                 ` Andrew Morton
2006-09-20  6:54                   ` Benjamin Herrenschmidt
2006-09-20 17:53                     ` Andrew Morton
2006-09-21 22:05                       ` Benjamin Herrenschmidt
2006-09-21 22:41                         ` Andrew Morton
2006-09-21 23:09                           ` Benjamin Herrenschmidt
2006-09-23 14:21                       ` Hugh Dickins
2006-09-23 19:46                         ` Andrew Morton
2006-09-23 22:35                           ` Benjamin Herrenschmidt
2006-09-20  5:06               ` Benjamin Herrenschmidt
2006-09-20  1:14       ` Mike Waychison
2006-09-20  2:02         ` Benjamin Herrenschmidt
2006-09-15 21:35 ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060915001151.75f9a71b.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mikew@google.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox