linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <npiggin@suse.de>, Andrew Morton <akpm@osdl.org>,
	Linux Memory Management <linux-mm@kvack.org>
Subject: Re: [patch 3/9] mm: speculative get page
Date: Mon, 25 Sep 2006 12:00:14 +1000	[thread overview]
Message-ID: <4517382E.8010308@yahoo.com.au> (raw)
In-Reply-To: <Pine.LNX.4.64.0609241802400.7935@blonde.wat.veritas.com>

Hugh Dickins wrote:

>On Fri, 22 Sep 2006, Nick Piggin wrote:
>
>>Index: linux-2.6/include/linux/page-flags.h
>>===================================================================
>>--- linux-2.6.orig/include/linux/page-flags.h
>>+++ linux-2.6/include/linux/page-flags.h
>>@@ -86,6 +86,8 @@
>> #define PG_nosave_free		18	/* Free, should not be written */
>> #define PG_buddy		19	/* Page is free, on buddy lists */
>> 
>>+#define PG_nonewrefs		20	/* Block concurrent pagecache lookups
>>+					 * while testing refcount */
>>
>
>Something I didn't get around to mentioning last time: I could well
>be mistaken, but it seemed that you could get along without all the
>PageNoNewRefs stuff, at cost of using something (too expensive?)
>like atomic_cmpxchg(&page->_count, 2, 0) in remove_mapping() and
>migrate_page_move_mapping(); compensated by simplification at the
>other end in page_cache_get_speculative(), which is already
>expected to be the hotter path.
>

Wow. That's amazing, why didn't I think of that? ;) Now that
we have a PG_buddy, this is going to work nicely.

>I find it unaesthetic (suspect you do too) to add that adhoc
>PageNoNewRefs method of freezing the count, when you're already
>demanding that count 0 must be frozen: why not make use of that?
>then since you know it's frozen while 0, you can easily insert
>the proper count at the end of the critical region.
>

Yes, and without using atomic ops, too.

>
>I didn't attempt to work out what memory barriers would be needed,
>but did test a version working that way on i386 - though I seem
>to have tidied those mods away to /dev/null since then.
>

Memory barriers will be reduced, because we're now only operating
on the single variable, rather than 2 (_count and flags), so we
don't need anything to order them (other than normal cache coherency).

Importantly, this will cut the smp_rmb() out of the speculative get,
which I suspect is why ia64 had slightly worse performance there.
Beautiful. (it will also cut out the smp_wmb()s and one atomic op out
of the write side).

>We disagreed over whether PageNoNewRefs usage in add_to_page_cache
>and __add_to_swap_cache was the same as in remove_mapping; but I
>think we agreed it could be avoided completely in those, just by
>being more careful about the ordering of the updates to struct page
>(I think it looked like the SetPageLocked needed to come earlier,
>but I forget the logic right now).
>

That's right. I have that in a followup patch, but in the interests
of keeping things small, I won't submit it for the first iteration.

>
>>+static inline int page_cache_get_speculative(struct page *page)
>>+{
>>+	VM_BUG_ON(in_interrupt());
>>+
>>+#ifndef CONFIG_SMP
>>+# ifdef CONFIG_PREEMPT
>>+	VM_BUG_ON(!in_atomic());
>>+# endif
>>+	/*
>>+	 * Preempt must be disabled here - we rely on rcu_read_lock doing
>>+	 * this for us.
>>+	 *
>>+	 * Pagecache won't be truncated from interrupt context, so if we have
>>+	 * found a page in the radix tree here, we have pinned its refcount by
>>+	 * disabling preempt, and hence no need for the "speculative get" that
>>+	 * SMP requires.
>>+	 */
>>+	VM_BUG_ON(page_count(page) == 0);
>>+	atomic_inc(&page->_count);
>>+
>>+#else
>>+	if (unlikely(!get_page_unless_zero(page)))
>>+		return 0; /* page has been freed */
>>
>
>This is the test which fails nicely whenever count is set to 0,
>whether because the page has been freed or because you wish to
>freeze it.  But if you do make such a change, callers of
>page_cache_get_speculative may need to loop a little differently
>when it fails (the page might not be freed).
>

Yes, they can just retry -- in the case that the page had been freed, 
they'll
find NULL in the radix tree slot on the next iteration: the 'return' here is
just a little shortcut.

>
>>+	VM_BUG_ON(PageCompound(page) && (struct page *)page_private(page) != page);
>>
>
>I found that VM_BUG_ON confusing, because it's only catching a tiny
>proportion of the cases you're interested in ruling out: most high
>order pages aren't PageCompound (but only the PageCompound ones offer
>that kind of check).  If you really want to keep the check, I think
>it needs a comment to explain that; but I'd just delete the line.
>

But it is OK to take a spec ref to a higher order non compound page, because
they follow the same refcounting rules as if they are individual order-0
pages.

Compound pages should be OK, because their constituent pages will have a 
count
of 0 and so will fail the previous test. This check was just for my own
satisfaction but I left it in as a form of commenting, however as you say I
should probably elaborate on that too.

Thank you thank you,
Nick
--

Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-09-25  2:00 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-22 19:22 [patch 0/4] lockless pagecache for 2.6.18-rc7-mm1 Nick Piggin
2006-09-22 19:22 ` [patch 2/4] radix-tree: use indirect bit Nick Piggin
2006-09-22 19:22 ` [patch 2/9] radix-tree: gang_lookup_slot Nick Piggin
2006-09-22 19:22 ` [patch 3/9] mm: speculative get page Nick Piggin
2006-09-23 10:01   ` Peter Zijlstra
2006-09-24 18:01   ` Hugh Dickins
2006-09-25  2:00     ` Nick Piggin [this message]
2006-09-25 11:47       ` Nick Piggin
2006-09-25 13:04         ` Peter Zijlstra
2006-09-25 22:41           ` Nick Piggin
2006-09-22 19:22 ` [patch 4/9] mm: lockless pagecache lookups Nick Piggin
2006-09-22 20:01   ` Lee Schermerhorn
2006-09-23  2:35     ` Nick Piggin
2006-09-22 19:24 ` [patch 0/4] lockless pagecache for 2.6.18-rc7-mm1 Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4517382E.8010308@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox