linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
	Andi Kleen <ak@linux.intel.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Dave Chinner <david@fromorbit.com>, linux-mm <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in page cache
Date: Mon, 17 Feb 2014 11:01:58 -0800	[thread overview]
Message-ID: <CA+55aFwz+36NOk=uanDvii7zn46-s1kpMT1Lt=C0hhhn9v6w-Q@mail.gmail.com> (raw)
In-Reply-To: <1392662333-25470-1-git-send-email-kirill.shutemov@linux.intel.com>

On Mon, Feb 17, 2014 at 10:38 AM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Now we have ->fault_nonblock() to ask filesystem for a page, if it's
> reachable without blocking. We request one page a time. It's not terribly
> efficient and I will probably re-think the interface once again to expose
> iterator or something...

Hmm. Yeah, clearly this isn't working, since the real workloads all
end up looking like

>        115,493,976      minor-faults                                                  ( +-  0.00% ) [100.00%]
>       59.686645587 seconds time elapsed                                          ( +-  0.30% )
 becomes
>         47,428,068      minor-faults                                                  ( +-  0.00% ) [100.00%]
>       60.241766430 seconds time elapsed                                          ( +-  0.85% )

and

>        268,039,365      minor-faults                                                 [100.00%]
>      132.830612471 seconds time elapsed
becomes
>        193,550,437      minor-faults                                                 [100.00%]
>      132.851823758 seconds time elapsed

and

>          4,967,540      minor-faults                                                  ( +-  0.06% ) [100.00%]
>       27.215434226 seconds time elapsed                                          ( +-  0.18% )
becomes
>          2,285,563      minor-faults                                                  ( +-  0.26% ) [100.00%]
>       27.292854546 seconds time elapsed                                          ( +-  0.29% )

ie it shows a clear reduction in faults, but the added costs clearly
eat up any wins and it all becomes (just _slightly_) slower.

Sad.

I do wonder if we really need to lock the pages we fault in. We lock
them in order to test for being up-to-date and still mapped. The
up-to-date check we don't really need to worry about: that we can test
without locking by just reading "page->flags" atomically and verifying
that it's uptodate and not locked.

The other reason to lock the page is:

 - for anonymous pages we need the lock for rmap, so the VM generally
always locks the page. But that's not an issue for file-backed pages:
the "rmap" for a filebacked page is just the page mapcount and the
cgroup statistics, and those don't need the page lock.

 - the whole truncation/unmapping thing

So the complex part is racing with truncate/unmapping the page. But
since we hold the page table lock, I *think* what we should be able to
do is:

 - increment the page _mapcount (iow, do "page_add_file_rmap()"
early). This guarantees that any *subsequent* unmap activity on this
page will walk the file mapping lists, and become serialized by the
page table lock we hold.

 - mb_after_atomic_inc() (this is generally free)

 - test that the page is still unlocked and uptodate, and the page
mapping still points to our page.

 - if that is true, we're all good, we can use the page, otherwise we
decrement the mapcount (page_remove_rmap()) and skip the page.

Hmm? Doing something like this means that we would never lock the
pages we prefault, and you can go back to your gang lookup rather than
that "one page at a time". And the race case is basically never going
to trigger.

Comments?

                  Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2014-02-17 19:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-17 18:38 Kirill A. Shutemov
2014-02-17 18:38 ` [PATCH 1/2] mm: introduce vm_ops->fault_nonblock() Kirill A. Shutemov
2014-02-17 18:38 ` [PATCH 2/2] mm: implement ->fault_nonblock() for page cache Kirill A. Shutemov
2014-02-17 19:01 ` Linus Torvalds [this message]
2014-02-17 19:49   ` [RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in " Kirill A. Shutemov
2014-02-17 20:24     ` Linus Torvalds
2014-02-18 13:28   ` Rik van Riel
2014-02-18 14:15     ` Wilcox, Matthew R
2014-02-18 18:02       ` Linus Torvalds
2014-02-18 18:53         ` Matthew Wilcox
2014-02-18 19:07           ` Linus Torvalds
2014-02-18 14:23     ` Kirill A. Shutemov
2014-02-18 17:51     ` Linus Torvalds
2014-02-18 17:59   ` Kirill A. Shutemov
2014-02-18 18:07     ` Kirill A. Shutemov
2014-02-18 18:28       ` Linus Torvalds
2014-02-18 23:57         ` Kirill A. Shutemov
2014-02-19  0:29           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+55aFwz+36NOk=uanDvii7zn46-s1kpMT1Lt=C0hhhn9v6w-Q@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox