linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrea Arcangeli <andrea@suse.de>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: 2.6.22 -mm merge plans -- vm bugfixes
Date: Thu, 03 May 2007 11:32:23 +1000	[thread overview]
Message-ID: <46393BA7.6030106@yahoo.com.au> (raw)
In-Reply-To: <Pine.LNX.4.64.0705021418030.16517@blonde.wat.veritas.com>

[-- Attachment #1: Type: text/plain, Size: 3875 bytes --]

Hugh Dickins wrote:
> On Wed, 2 May 2007, Nick Piggin wrote:

[snip]

> More on-topic, since you suggest doing more within vmtruncate_range
> than the filesystem: no, I'm afraid that's misdesigned, and I want
> to move almost all of it into the filesystem ->truncate_range.
> Because, if what vmtruncate_range is doing before it gets to the
> filesystem isn't to be just a waste of time, the filesystem needs
> to know what's going on in advance - just as notify_change warns
> the filesystem about a coming truncation.  But easier than inventing
> some new notification is to move it all into the filesystem, with
> unmap_mapping_range+truncate_inode_pages_range its library helpers.

Well I would prefer it to follow the same pattern as regular
truncate. I don't think it is misdesigned to call the filesystem
_first_, but I think if you do that then the filesystem should
call the vm to prepare / finish truncate, rather than open code
calls to unmap itself.


>>>But I'm pretty sure (to use your words!) regular truncate was not racy
>>>before: I believe Andrea's sequence count was handling that case fine,
>>>without a second unmap_mapping_range.
>>
>>OK, I think you're right. I _think_ it should also be OK with the
>>lock_page version as well: we should not be able to have any pages
>>after the first unmap_mapping_range call, because of the i_size
>>write. So if we have no pages, there is nothing to 'cow' from.
> 
> 
> I'd be delighted if you can remove those later unmap_mapping_ranges.
> As I recall, the important thing for the copy pages is to be holding
> the page lock (or whatever other serialization) on the copied page
> still while the copy page is inserted into pagetable: that looks
> to be so in your __do_fault.

Yeah, I think my thought process went wrong on those... I'll
revisit.


>>>But it is a shame, and leaves me wondering what you gained with the
>>>page lock there.
>>>
>>>One thing gained is ease of understanding, and if your later patches
>>>build an edifice upon the knowledge of holding that page lock while
>>>faulting, I've no wish to undermine that foundation.
>>
>>It also fixes a bug, doesn't it? ;)
> 
> 
> Well, I'd come to think that perhaps the bugs would be solved by
> that second unmap_mapping_range alone, so the pagelock changes
> just a misleading diversion.
> 
> I'm not sure how I feel about that: calling unmap_mapping_range a
> second time feels such a cheat, but if (big if) it does solve the
> races, and the pagelock method is as expensive as your numbers
> now suggest...

Well aside from being terribly ugly, it means we can still drop
the dirty bit where we'd otherwise rather not, so I don't think
we can do that.

I think there may be some way we can do this without taking the
page lock, and I was going to look at it, but I think it is
quite neat to just lock the page...

I don't think performance is _that_ bad. On the P4 it is a couple
of % on the microbenchmarks. The G5 is worse, but even then I
don't think it is I'll try to improve that and get back to you.

The problem is that lock/unlock_page is expensive on powerpc, and
if we improve that, we improve more than just the fault handler...

The attached patch gets performance up a bit by avoiding some
barriers and some cachelines:

G5
          pagefault   fork          exec
2.6.21   1.49-1.51   164.6-170.8   741.8-760.3
+patch   1.71-1.73   175.2-180.8   780.5-794.2
+patch2  1.61-1.63   169.8-175.0   748.6-757.0

So that brings the fork/exec hits down to much less than 5%, and
would likely speed up other things that lock the page, like write
or page reclaim.

I think we could get further performance improvement by
implementing arch specific bitops for lock/unlock operations,
so we don't need to use things like smb_mb__before_clear_bit()
if they aren't needed or full barriers in the test_and_set_bit().

-- 
SUSE Labs, Novell Inc.


[-- Attachment #2: mm-unlock-speedup.patch --]
[-- Type: text/plain, Size: 2228 bytes --]

Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h	2007-04-24 10:39:56.000000000 +1000
+++ linux-2.6/include/linux/page-flags.h	2007-05-03 08:38:53.000000000 +1000
@@ -91,6 +91,8 @@
 #define PG_nosave_free		18	/* Used for system suspend/resume */
 #define PG_buddy		19	/* Page is free, on buddy lists */
 
+#define PG_waiters		20	/* Page has PG_locked waiters */
+
 /* PG_owner_priv_1 users should have descriptive aliases */
 #define PG_checked		PG_owner_priv_1 /* Used by some filesystems */
 
Index: linux-2.6/include/linux/pagemap.h
===================================================================
--- linux-2.6.orig/include/linux/pagemap.h	2007-04-24 10:39:56.000000000 +1000
+++ linux-2.6/include/linux/pagemap.h	2007-05-03 08:35:08.000000000 +1000
@@ -141,7 +141,7 @@
 static inline void lock_page(struct page *page)
 {
 	might_sleep();
-	if (TestSetPageLocked(page))
+	if (unlikely(TestSetPageLocked(page)))
 		__lock_page(page);
 }
 
@@ -152,7 +152,7 @@
 static inline void lock_page_nosync(struct page *page)
 {
 	might_sleep();
-	if (TestSetPageLocked(page))
+	if (unlikely(TestSetPageLocked(page)))
 		__lock_page_nosync(page);
 }
 	
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c	2007-05-02 15:00:26.000000000 +1000
+++ linux-2.6/mm/filemap.c	2007-05-03 08:34:32.000000000 +1000
@@ -532,11 +532,13 @@
  */
 void fastcall unlock_page(struct page *page)
 {
+	VM_BUG_ON(!PageLocked(page));
 	smp_mb__before_clear_bit();
-	if (!TestClearPageLocked(page))
-		BUG();
-	smp_mb__after_clear_bit(); 
-	wake_up_page(page, PG_locked);
+	ClearPageLocked(page);
+	if (unlikely(test_bit(PG_waiters, &page->flags))) {
+		clear_bit(PG_waiters, &page->flags);
+		wake_up_page(page, PG_locked);
+	}
 }
 EXPORT_SYMBOL(unlock_page);
 
@@ -568,6 +570,11 @@
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
 
+	set_bit(PG_waiters, &page->flags);
+	if (unlikely(!TestSetPageLocked(page))) {
+		clear_bit(PG_waiters, &page->flags);
+		return;
+	}
 	__wait_on_bit_lock(page_waitqueue(page), &wait, sync_page,
 							TASK_UNINTERRUPTIBLE);
 }

  reply	other threads:[~2007-05-03  1:32 UTC|newest]

Thread overview: 180+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-30 23:20 2.6.22 -mm merge plans Andrew Morton
2007-04-30 23:59 ` Bill Irwin
2007-05-01  0:09 ` nfsd/md patches " Neil Brown
2007-05-01  9:08   ` Christoph Hellwig
2007-05-01  9:15     ` Andrew Morton
2007-05-01  9:21       ` Christoph Hellwig
2007-05-01  9:52     ` Neil Brown
2007-05-01 10:15       ` Christoph Hellwig
2007-05-01 14:34         ` Trond Myklebust
2007-05-01  0:54 ` MADV_FREE functionality Rik van Riel
2007-05-01  1:18   ` Andrew Morton
2007-05-01  1:23     ` Rik van Riel
2007-05-01  7:13     ` Jakub Jelinek
2007-05-01  1:23   ` Ulrich Drepper
2007-05-01  1:39 ` 2.6.22 -mm merge plans Stefan Richter
2007-05-01  2:30 ` 2.6.22 -mm merge plans (RE: input) Dmitry Torokhov
2007-05-01  8:11 ` 2.6.22 -mm merge plans -- pfn_valid_within Andy Whitcroft
2007-05-01  8:19   ` Andrew Morton
2007-05-01  8:44 ` 2.6.22 -mm merge plans -- vm bugfixes Nick Piggin
2007-05-01  8:54   ` Andrew Morton
2007-05-01 19:31   ` Hugh Dickins
2007-05-02  3:08     ` Nick Piggin
2007-05-02  9:15       ` Nick Piggin
2007-05-02 14:00       ` Hugh Dickins
2007-05-03  1:32         ` Nick Piggin [this message]
2007-05-03 10:37           ` Christoph Hellwig
2007-05-03 12:56             ` Nick Piggin
2007-05-04  9:23               ` Nick Piggin
2007-05-04  9:43                 ` Nick Piggin
2007-05-08  3:03                 ` Benjamin Herrenschmidt
2007-05-03 12:24           ` Hugh Dickins
2007-05-03 12:43             ` Nick Piggin
2007-05-03 12:58               ` Hugh Dickins
2007-05-03 13:08                 ` Nick Piggin
2007-05-03 16:52           ` Andrew Morton
2007-05-04  4:16             ` Nick Piggin
2007-05-09 12:34         ` Nick Piggin
2007-05-09 14:28           ` Hugh Dickins
2007-05-09 14:45             ` Nick Piggin
2007-05-09 15:38               ` Hugh Dickins
2007-05-09 22:24                 ` Nick Piggin
2007-05-01  8:46 ` pcmcia ioctl removal Christoph Hellwig
2007-05-01  8:56   ` Russell King
2007-05-01  8:57   ` Willy Tarreau
2007-05-01  9:08     ` Andrew Morton
2007-05-01 14:46       ` Adrian Bunk
2007-05-01  9:16   ` Robert P. J. Day
2007-05-01  9:44     ` Willy Tarreau
2007-05-01 10:16       ` Robert P. J. Day
2007-05-01 10:26       ` Gabriel C
2007-05-01 10:52         ` Willy Tarreau
2007-05-01 10:12     ` Jan Engelhardt
2007-05-01 11:00       ` Willy Tarreau
2007-05-01 12:06         ` Konstantin Münning
2007-05-01 13:56         ` Rogan Dawes
2007-05-01 19:10       ` Russell King
2007-05-01 20:41         ` Jan Engelhardt
2007-05-09 12:54   ` Pavel Machek
2007-05-09 13:00     ` Robert P. J. Day
2007-05-09 13:03     ` Adrian Bunk
2007-05-09 19:11       ` Romano Giannetti
2007-05-10 12:40         ` Adrian Bunk
2007-05-01  8:48 ` pci hotplug patches Christoph Hellwig
2007-05-02  3:57   ` Greg KH
2007-05-13 20:59     ` Christoph Hellwig
2007-05-14 11:48       ` Greg KH
2007-05-01  8:54 ` cache-pipe-buf-page-address-for-non-highmem-arch.patch Christoph Hellwig
2007-05-01  9:04   ` cache-pipe-buf-page-address-for-non-highmem-arch.patch Andrew Morton
2007-05-01 11:31     ` cache-pipe-buf-page-address-for-non-highmem-arch.patch Andi Kleen
2007-05-03  3:48     ` cache-pipe-buf-page-address-for-non-highmem-arch.patch Ken Chen
2007-05-01  8:55 ` consolidate-generic_writepages-and-mpage_writepages.patch Christoph Hellwig
2007-05-01  9:17 ` 2.6.22 -mm merge plans Pekka Enberg
2007-05-01  9:24   ` Christoph Hellwig
2007-05-01  9:37   ` Peter Zijlstra
2007-05-01 12:19   ` Andi Kleen
2007-05-01 17:12     ` Pekka Enberg
2007-05-01 10:16 ` fragmentation avoidance " Mel Gorman
2007-05-01 13:02   ` 2.6.22 -mm merge plans -- lumpy reclaim Andy Whitcroft
2007-05-01 18:03     ` Peter Zijlstra
2007-05-01 19:00     ` Andrew Morton
2007-05-01 14:54   ` fragmentation avoidance Re: 2.6.22 -mm merge plans Christoph Lameter
2007-05-01 19:00     ` Mel Gorman
2007-05-01 18:57   ` Andrew Morton
2007-05-07 13:07   ` Yasunori Goto
2007-05-01 14:31 ` 2.6.22 -mm merge plans: mm-more-rmap-checking Hugh Dickins
2007-05-02  1:42   ` Nick Piggin
2007-05-02 13:17     ` Hugh Dickins
2007-05-03  0:18       ` Nick Piggin
2007-05-01 16:56 ` 2.6.22 -mm merge plans Zan Lynx
2007-05-01 17:06 ` 2.6.22 -mm merge plans: mm-detach_vmas_to_be_unmapped-fix Hugh Dickins
2007-05-01 18:10 ` 2.6.22 -mm merge plans: slub Hugh Dickins
2007-05-01 19:25   ` Christoph Lameter
2007-05-01 19:55   ` Andrew Morton
2007-05-01 20:19     ` Hugh Dickins
2007-05-01 20:36       ` Andrew Morton
2007-05-01 20:46         ` Christoph Lameter
2007-05-01 21:09           ` Andrew Morton
2007-05-02 12:54         ` Hugh Dickins
2007-05-02 17:03           ` Christoph Lameter
2007-05-02 19:11             ` Andrew Morton
2007-05-02 19:42               ` Christoph Lameter
2007-05-02 19:54                 ` Sam Ravnborg
2007-05-02 20:14                   ` Christoph Lameter
2007-05-02 18:52           ` Siddha, Suresh B
2007-05-02 18:58             ` Christoph Lameter
2007-05-01 21:08       ` Christoph Lameter
2007-05-02 12:45         ` Hugh Dickins
2007-05-02 17:01           ` Christoph Lameter
2007-05-02 18:08             ` Hugh Dickins
2007-05-02 18:28               ` Christoph Lameter
2007-05-02 18:42                 ` Andrew Morton
2007-05-02 18:53                   ` Christoph Lameter
2007-05-02 17:25           ` Christoph Lameter
2007-05-02 18:36             ` Hugh Dickins
2007-05-02 18:39               ` Christoph Lameter
2007-05-02 18:57                 ` Andrew Morton
2007-05-02 19:01                   ` Christoph Lameter
2007-05-02 19:18                     ` Pekka Enberg
2007-05-02 19:34                       ` Christoph Lameter
2007-05-02 19:43                       ` Christoph Lameter
2007-05-03  8:15             ` Andrew Morton
2007-05-03  8:27               ` William Lee Irwin III
2007-05-03 16:30                 ` Christoph Lameter
2007-05-03  8:46               ` Hugh Dickins
2007-05-03  8:57                 ` Andrew Morton
2007-05-03  9:15                   ` Hugh Dickins
2007-05-03 21:04                     ` 2.6.22 -mm merge plans: slub on PowerPC Hugh Dickins
2007-05-03 21:15                       ` Christoph Lameter
2007-05-03 22:41                         ` Hugh Dickins
2007-05-04  0:25                       ` Benjamin Herrenschmidt
2007-05-04  0:54                         ` Christoph Lameter
2007-05-03 16:45                   ` 2.6.22 -mm merge plans: slub Christoph Lameter
2007-05-03 15:54 ` swap-prefetch: 2.6.22 -mm merge plans Ingo Molnar
2007-05-03 16:15   ` Michal Piotrowski
2007-05-03 16:23     ` Michal Piotrowski
2007-05-03 22:14   ` Con Kolivas
2007-05-04  7:34   ` Nick Piggin
2007-05-04  8:52     ` Ingo Molnar
2007-05-04  9:09       ` Nick Piggin
2007-05-04 12:10       ` Con Kolivas
2007-05-05  8:42         ` Con Kolivas
2007-05-06 10:13           ` [ck] " Antonino Ingargiola
2007-05-06 18:22           ` Jory A. Pratt
2007-05-09 23:28           ` Con Kolivas
2007-05-10  0:05             ` Nick Piggin
2007-05-10  1:34               ` Con Kolivas
2007-05-10  1:56                 ` Nick Piggin
2007-05-10  3:48                   ` Ray Lee
2007-05-10  3:56                     ` Nick Piggin
2007-05-10  5:52                       ` Ray Lee
2007-05-10  7:04                         ` Nick Piggin
2007-05-10  7:20                           ` William Lee Irwin III
2007-05-10 12:34                           ` Ray Lee
2007-05-12  4:46                           ` [PATCH] mm: swap prefetch improvements Con Kolivas
2007-05-12  5:03                             ` Paul Jackson
2007-05-12  5:15                               ` Con Kolivas
2007-05-12  5:51                                 ` Paul Jackson
2007-05-12  7:28                                   ` Con Kolivas
2007-05-12  8:14                                     ` Paul Jackson
2007-05-12  8:21                                       ` Con Kolivas
2007-05-12  8:37                                         ` Paul Jackson
2007-05-12  8:57                                           ` [PATCH respin] " Con Kolivas
2007-05-21 10:03                             ` [PATCH] " Ingo Molnar
2007-05-21 13:44                               ` Con Kolivas
2007-05-21 16:00                                 ` Ingo Molnar
2007-05-22 10:15                                   ` Antonino Ingargiola
2007-05-22 10:20                                     ` Con Kolivas
2007-05-22 10:25                                       ` Ingo Molnar
2007-05-22 10:37                                         ` Con Kolivas
2007-05-22 10:46                                           ` Ingo Molnar
2007-05-22 10:54                                             ` Con Kolivas
2007-05-22 10:57                                               ` Ingo Molnar
2007-05-22 11:04                                                 ` Con Kolivas
     [not found]                                                   ` <20070522111104.GA14950@elte.hu>
2007-05-22 11:12                                                     ` Ingo Molnar
2007-05-22 20:18                                             ` [ck] " Michael Chang
2007-05-22 20:31                                               ` Ingo Molnar
2007-05-10  3:58                     ` swap-prefetch: 2.6.22 -mm merge plans Con Kolivas
2007-05-07 14:28         ` Bill Davidsen
2007-05-07 14:18       ` Bill Davidsen
2007-05-07 17:47 ` Josef Sipek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46393BA7.6030106@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@suse.de \
    --cc=hch@infradead.org \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox