From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Andrea Arcangeli <andrea@suse.de>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: 2.6.22 -mm merge plans -- vm bugfixes
Date: Thu, 03 May 2007 11:32:23 +1000 [thread overview]
Message-ID: <46393BA7.6030106@yahoo.com.au> (raw)
In-Reply-To: <Pine.LNX.4.64.0705021418030.16517@blonde.wat.veritas.com>
[-- Attachment #1: Type: text/plain, Size: 3875 bytes --]
Hugh Dickins wrote:
> On Wed, 2 May 2007, Nick Piggin wrote:
[snip]
> More on-topic, since you suggest doing more within vmtruncate_range
> than the filesystem: no, I'm afraid that's misdesigned, and I want
> to move almost all of it into the filesystem ->truncate_range.
> Because, if what vmtruncate_range is doing before it gets to the
> filesystem isn't to be just a waste of time, the filesystem needs
> to know what's going on in advance - just as notify_change warns
> the filesystem about a coming truncation. But easier than inventing
> some new notification is to move it all into the filesystem, with
> unmap_mapping_range+truncate_inode_pages_range its library helpers.
Well I would prefer it to follow the same pattern as regular
truncate. I don't think it is misdesigned to call the filesystem
_first_, but I think if you do that then the filesystem should
call the vm to prepare / finish truncate, rather than open code
calls to unmap itself.
>>>But I'm pretty sure (to use your words!) regular truncate was not racy
>>>before: I believe Andrea's sequence count was handling that case fine,
>>>without a second unmap_mapping_range.
>>
>>OK, I think you're right. I _think_ it should also be OK with the
>>lock_page version as well: we should not be able to have any pages
>>after the first unmap_mapping_range call, because of the i_size
>>write. So if we have no pages, there is nothing to 'cow' from.
>
>
> I'd be delighted if you can remove those later unmap_mapping_ranges.
> As I recall, the important thing for the copy pages is to be holding
> the page lock (or whatever other serialization) on the copied page
> still while the copy page is inserted into pagetable: that looks
> to be so in your __do_fault.
Yeah, I think my thought process went wrong on those... I'll
revisit.
>>>But it is a shame, and leaves me wondering what you gained with the
>>>page lock there.
>>>
>>>One thing gained is ease of understanding, and if your later patches
>>>build an edifice upon the knowledge of holding that page lock while
>>>faulting, I've no wish to undermine that foundation.
>>
>>It also fixes a bug, doesn't it? ;)
>
>
> Well, I'd come to think that perhaps the bugs would be solved by
> that second unmap_mapping_range alone, so the pagelock changes
> just a misleading diversion.
>
> I'm not sure how I feel about that: calling unmap_mapping_range a
> second time feels such a cheat, but if (big if) it does solve the
> races, and the pagelock method is as expensive as your numbers
> now suggest...
Well aside from being terribly ugly, it means we can still drop
the dirty bit where we'd otherwise rather not, so I don't think
we can do that.
I think there may be some way we can do this without taking the
page lock, and I was going to look at it, but I think it is
quite neat to just lock the page...
I don't think performance is _that_ bad. On the P4 it is a couple
of % on the microbenchmarks. The G5 is worse, but even then I
don't think it is I'll try to improve that and get back to you.
The problem is that lock/unlock_page is expensive on powerpc, and
if we improve that, we improve more than just the fault handler...
The attached patch gets performance up a bit by avoiding some
barriers and some cachelines:
G5
pagefault fork exec
2.6.21 1.49-1.51 164.6-170.8 741.8-760.3
+patch 1.71-1.73 175.2-180.8 780.5-794.2
+patch2 1.61-1.63 169.8-175.0 748.6-757.0
So that brings the fork/exec hits down to much less than 5%, and
would likely speed up other things that lock the page, like write
or page reclaim.
I think we could get further performance improvement by
implementing arch specific bitops for lock/unlock operations,
so we don't need to use things like smb_mb__before_clear_bit()
if they aren't needed or full barriers in the test_and_set_bit().
--
SUSE Labs, Novell Inc.
[-- Attachment #2: mm-unlock-speedup.patch --]
[-- Type: text/plain, Size: 2228 bytes --]
Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h 2007-04-24 10:39:56.000000000 +1000
+++ linux-2.6/include/linux/page-flags.h 2007-05-03 08:38:53.000000000 +1000
@@ -91,6 +91,8 @@
#define PG_nosave_free 18 /* Used for system suspend/resume */
#define PG_buddy 19 /* Page is free, on buddy lists */
+#define PG_waiters 20 /* Page has PG_locked waiters */
+
/* PG_owner_priv_1 users should have descriptive aliases */
#define PG_checked PG_owner_priv_1 /* Used by some filesystems */
Index: linux-2.6/include/linux/pagemap.h
===================================================================
--- linux-2.6.orig/include/linux/pagemap.h 2007-04-24 10:39:56.000000000 +1000
+++ linux-2.6/include/linux/pagemap.h 2007-05-03 08:35:08.000000000 +1000
@@ -141,7 +141,7 @@
static inline void lock_page(struct page *page)
{
might_sleep();
- if (TestSetPageLocked(page))
+ if (unlikely(TestSetPageLocked(page)))
__lock_page(page);
}
@@ -152,7 +152,7 @@
static inline void lock_page_nosync(struct page *page)
{
might_sleep();
- if (TestSetPageLocked(page))
+ if (unlikely(TestSetPageLocked(page)))
__lock_page_nosync(page);
}
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c 2007-05-02 15:00:26.000000000 +1000
+++ linux-2.6/mm/filemap.c 2007-05-03 08:34:32.000000000 +1000
@@ -532,11 +532,13 @@
*/
void fastcall unlock_page(struct page *page)
{
+ VM_BUG_ON(!PageLocked(page));
smp_mb__before_clear_bit();
- if (!TestClearPageLocked(page))
- BUG();
- smp_mb__after_clear_bit();
- wake_up_page(page, PG_locked);
+ ClearPageLocked(page);
+ if (unlikely(test_bit(PG_waiters, &page->flags))) {
+ clear_bit(PG_waiters, &page->flags);
+ wake_up_page(page, PG_locked);
+ }
}
EXPORT_SYMBOL(unlock_page);
@@ -568,6 +570,11 @@
{
DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
+ set_bit(PG_waiters, &page->flags);
+ if (unlikely(!TestSetPageLocked(page))) {
+ clear_bit(PG_waiters, &page->flags);
+ return;
+ }
__wait_on_bit_lock(page_waitqueue(page), &wait, sync_page,
TASK_UNINTERRUPTIBLE);
}
next prev parent reply other threads:[~2007-05-03 1:32 UTC|newest]
Thread overview: 180+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-30 23:20 2.6.22 -mm merge plans Andrew Morton
2007-04-30 23:59 ` Bill Irwin
2007-05-01 0:09 ` nfsd/md patches " Neil Brown
2007-05-01 9:08 ` Christoph Hellwig
2007-05-01 9:15 ` Andrew Morton
2007-05-01 9:21 ` Christoph Hellwig
2007-05-01 9:52 ` Neil Brown
2007-05-01 10:15 ` Christoph Hellwig
2007-05-01 14:34 ` Trond Myklebust
2007-05-01 0:54 ` MADV_FREE functionality Rik van Riel
2007-05-01 1:18 ` Andrew Morton
2007-05-01 1:23 ` Rik van Riel
2007-05-01 7:13 ` Jakub Jelinek
2007-05-01 1:23 ` Ulrich Drepper
2007-05-01 1:39 ` 2.6.22 -mm merge plans Stefan Richter
2007-05-01 2:30 ` 2.6.22 -mm merge plans (RE: input) Dmitry Torokhov
2007-05-01 8:11 ` 2.6.22 -mm merge plans -- pfn_valid_within Andy Whitcroft
2007-05-01 8:19 ` Andrew Morton
2007-05-01 8:44 ` 2.6.22 -mm merge plans -- vm bugfixes Nick Piggin
2007-05-01 8:54 ` Andrew Morton
2007-05-01 19:31 ` Hugh Dickins
2007-05-02 3:08 ` Nick Piggin
2007-05-02 9:15 ` Nick Piggin
2007-05-02 14:00 ` Hugh Dickins
2007-05-03 1:32 ` Nick Piggin [this message]
2007-05-03 10:37 ` Christoph Hellwig
2007-05-03 12:56 ` Nick Piggin
2007-05-04 9:23 ` Nick Piggin
2007-05-04 9:43 ` Nick Piggin
2007-05-08 3:03 ` Benjamin Herrenschmidt
2007-05-03 12:24 ` Hugh Dickins
2007-05-03 12:43 ` Nick Piggin
2007-05-03 12:58 ` Hugh Dickins
2007-05-03 13:08 ` Nick Piggin
2007-05-03 16:52 ` Andrew Morton
2007-05-04 4:16 ` Nick Piggin
2007-05-09 12:34 ` Nick Piggin
2007-05-09 14:28 ` Hugh Dickins
2007-05-09 14:45 ` Nick Piggin
2007-05-09 15:38 ` Hugh Dickins
2007-05-09 22:24 ` Nick Piggin
2007-05-01 8:46 ` pcmcia ioctl removal Christoph Hellwig
2007-05-01 8:56 ` Russell King
2007-05-01 8:57 ` Willy Tarreau
2007-05-01 9:08 ` Andrew Morton
2007-05-01 14:46 ` Adrian Bunk
2007-05-01 9:16 ` Robert P. J. Day
2007-05-01 9:44 ` Willy Tarreau
2007-05-01 10:16 ` Robert P. J. Day
2007-05-01 10:26 ` Gabriel C
2007-05-01 10:52 ` Willy Tarreau
2007-05-01 10:12 ` Jan Engelhardt
2007-05-01 11:00 ` Willy Tarreau
2007-05-01 12:06 ` Konstantin Münning
2007-05-01 13:56 ` Rogan Dawes
2007-05-01 19:10 ` Russell King
2007-05-01 20:41 ` Jan Engelhardt
2007-05-09 12:54 ` Pavel Machek
2007-05-09 13:00 ` Robert P. J. Day
2007-05-09 13:03 ` Adrian Bunk
2007-05-09 19:11 ` Romano Giannetti
2007-05-10 12:40 ` Adrian Bunk
2007-05-01 8:48 ` pci hotplug patches Christoph Hellwig
2007-05-02 3:57 ` Greg KH
2007-05-13 20:59 ` Christoph Hellwig
2007-05-14 11:48 ` Greg KH
2007-05-01 8:54 ` cache-pipe-buf-page-address-for-non-highmem-arch.patch Christoph Hellwig
2007-05-01 9:04 ` cache-pipe-buf-page-address-for-non-highmem-arch.patch Andrew Morton
2007-05-01 11:31 ` cache-pipe-buf-page-address-for-non-highmem-arch.patch Andi Kleen
2007-05-03 3:48 ` cache-pipe-buf-page-address-for-non-highmem-arch.patch Ken Chen
2007-05-01 8:55 ` consolidate-generic_writepages-and-mpage_writepages.patch Christoph Hellwig
2007-05-01 9:17 ` 2.6.22 -mm merge plans Pekka Enberg
2007-05-01 9:24 ` Christoph Hellwig
2007-05-01 9:37 ` Peter Zijlstra
2007-05-01 12:19 ` Andi Kleen
2007-05-01 17:12 ` Pekka Enberg
2007-05-01 10:16 ` fragmentation avoidance " Mel Gorman
2007-05-01 13:02 ` 2.6.22 -mm merge plans -- lumpy reclaim Andy Whitcroft
2007-05-01 18:03 ` Peter Zijlstra
2007-05-01 19:00 ` Andrew Morton
2007-05-01 14:54 ` fragmentation avoidance Re: 2.6.22 -mm merge plans Christoph Lameter
2007-05-01 19:00 ` Mel Gorman
2007-05-01 18:57 ` Andrew Morton
2007-05-07 13:07 ` Yasunori Goto
2007-05-01 14:31 ` 2.6.22 -mm merge plans: mm-more-rmap-checking Hugh Dickins
2007-05-02 1:42 ` Nick Piggin
2007-05-02 13:17 ` Hugh Dickins
2007-05-03 0:18 ` Nick Piggin
2007-05-01 16:56 ` 2.6.22 -mm merge plans Zan Lynx
2007-05-01 17:06 ` 2.6.22 -mm merge plans: mm-detach_vmas_to_be_unmapped-fix Hugh Dickins
2007-05-01 18:10 ` 2.6.22 -mm merge plans: slub Hugh Dickins
2007-05-01 19:25 ` Christoph Lameter
2007-05-01 19:55 ` Andrew Morton
2007-05-01 20:19 ` Hugh Dickins
2007-05-01 20:36 ` Andrew Morton
2007-05-01 20:46 ` Christoph Lameter
2007-05-01 21:09 ` Andrew Morton
2007-05-02 12:54 ` Hugh Dickins
2007-05-02 17:03 ` Christoph Lameter
2007-05-02 19:11 ` Andrew Morton
2007-05-02 19:42 ` Christoph Lameter
2007-05-02 19:54 ` Sam Ravnborg
2007-05-02 20:14 ` Christoph Lameter
2007-05-02 18:52 ` Siddha, Suresh B
2007-05-02 18:58 ` Christoph Lameter
2007-05-01 21:08 ` Christoph Lameter
2007-05-02 12:45 ` Hugh Dickins
2007-05-02 17:01 ` Christoph Lameter
2007-05-02 18:08 ` Hugh Dickins
2007-05-02 18:28 ` Christoph Lameter
2007-05-02 18:42 ` Andrew Morton
2007-05-02 18:53 ` Christoph Lameter
2007-05-02 17:25 ` Christoph Lameter
2007-05-02 18:36 ` Hugh Dickins
2007-05-02 18:39 ` Christoph Lameter
2007-05-02 18:57 ` Andrew Morton
2007-05-02 19:01 ` Christoph Lameter
2007-05-02 19:18 ` Pekka Enberg
2007-05-02 19:34 ` Christoph Lameter
2007-05-02 19:43 ` Christoph Lameter
2007-05-03 8:15 ` Andrew Morton
2007-05-03 8:27 ` William Lee Irwin III
2007-05-03 16:30 ` Christoph Lameter
2007-05-03 8:46 ` Hugh Dickins
2007-05-03 8:57 ` Andrew Morton
2007-05-03 9:15 ` Hugh Dickins
2007-05-03 21:04 ` 2.6.22 -mm merge plans: slub on PowerPC Hugh Dickins
2007-05-03 21:15 ` Christoph Lameter
2007-05-03 22:41 ` Hugh Dickins
2007-05-04 0:25 ` Benjamin Herrenschmidt
2007-05-04 0:54 ` Christoph Lameter
2007-05-03 16:45 ` 2.6.22 -mm merge plans: slub Christoph Lameter
2007-05-03 15:54 ` swap-prefetch: 2.6.22 -mm merge plans Ingo Molnar
2007-05-03 16:15 ` Michal Piotrowski
2007-05-03 16:23 ` Michal Piotrowski
2007-05-03 22:14 ` Con Kolivas
2007-05-04 7:34 ` Nick Piggin
2007-05-04 8:52 ` Ingo Molnar
2007-05-04 9:09 ` Nick Piggin
2007-05-04 12:10 ` Con Kolivas
2007-05-05 8:42 ` Con Kolivas
2007-05-06 10:13 ` [ck] " Antonino Ingargiola
2007-05-06 18:22 ` Jory A. Pratt
2007-05-09 23:28 ` Con Kolivas
2007-05-10 0:05 ` Nick Piggin
2007-05-10 1:34 ` Con Kolivas
2007-05-10 1:56 ` Nick Piggin
2007-05-10 3:48 ` Ray Lee
2007-05-10 3:56 ` Nick Piggin
2007-05-10 5:52 ` Ray Lee
2007-05-10 7:04 ` Nick Piggin
2007-05-10 7:20 ` William Lee Irwin III
2007-05-10 12:34 ` Ray Lee
2007-05-12 4:46 ` [PATCH] mm: swap prefetch improvements Con Kolivas
2007-05-12 5:03 ` Paul Jackson
2007-05-12 5:15 ` Con Kolivas
2007-05-12 5:51 ` Paul Jackson
2007-05-12 7:28 ` Con Kolivas
2007-05-12 8:14 ` Paul Jackson
2007-05-12 8:21 ` Con Kolivas
2007-05-12 8:37 ` Paul Jackson
2007-05-12 8:57 ` [PATCH respin] " Con Kolivas
2007-05-21 10:03 ` [PATCH] " Ingo Molnar
2007-05-21 13:44 ` Con Kolivas
2007-05-21 16:00 ` Ingo Molnar
2007-05-22 10:15 ` Antonino Ingargiola
2007-05-22 10:20 ` Con Kolivas
2007-05-22 10:25 ` Ingo Molnar
2007-05-22 10:37 ` Con Kolivas
2007-05-22 10:46 ` Ingo Molnar
2007-05-22 10:54 ` Con Kolivas
2007-05-22 10:57 ` Ingo Molnar
2007-05-22 11:04 ` Con Kolivas
[not found] ` <20070522111104.GA14950@elte.hu>
2007-05-22 11:12 ` Ingo Molnar
2007-05-22 20:18 ` [ck] " Michael Chang
2007-05-22 20:31 ` Ingo Molnar
2007-05-10 3:58 ` swap-prefetch: 2.6.22 -mm merge plans Con Kolivas
2007-05-07 14:28 ` Bill Davidsen
2007-05-07 14:18 ` Bill Davidsen
2007-05-07 17:47 ` Josef Sipek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46393BA7.6030106@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@linux-foundation.org \
--cc=andrea@suse.de \
--cc=hch@infradead.org \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox