From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <npiggin@suse.de>, Andrew Morton <akpm@osdl.org>,
Linux Memory Management <linux-mm@kvack.org>
Subject: Re: [patch 3/9] mm: speculative get page
Date: Mon, 25 Sep 2006 12:00:14 +1000 [thread overview]
Message-ID: <4517382E.8010308@yahoo.com.au> (raw)
In-Reply-To: <Pine.LNX.4.64.0609241802400.7935@blonde.wat.veritas.com>
Hugh Dickins wrote:
>On Fri, 22 Sep 2006, Nick Piggin wrote:
>
>>Index: linux-2.6/include/linux/page-flags.h
>>===================================================================
>>--- linux-2.6.orig/include/linux/page-flags.h
>>+++ linux-2.6/include/linux/page-flags.h
>>@@ -86,6 +86,8 @@
>> #define PG_nosave_free 18 /* Free, should not be written */
>> #define PG_buddy 19 /* Page is free, on buddy lists */
>>
>>+#define PG_nonewrefs 20 /* Block concurrent pagecache lookups
>>+ * while testing refcount */
>>
>
>Something I didn't get around to mentioning last time: I could well
>be mistaken, but it seemed that you could get along without all the
>PageNoNewRefs stuff, at cost of using something (too expensive?)
>like atomic_cmpxchg(&page->_count, 2, 0) in remove_mapping() and
>migrate_page_move_mapping(); compensated by simplification at the
>other end in page_cache_get_speculative(), which is already
>expected to be the hotter path.
>
Wow. That's amazing, why didn't I think of that? ;) Now that
we have a PG_buddy, this is going to work nicely.
>I find it unaesthetic (suspect you do too) to add that adhoc
>PageNoNewRefs method of freezing the count, when you're already
>demanding that count 0 must be frozen: why not make use of that?
>then since you know it's frozen while 0, you can easily insert
>the proper count at the end of the critical region.
>
Yes, and without using atomic ops, too.
>
>I didn't attempt to work out what memory barriers would be needed,
>but did test a version working that way on i386 - though I seem
>to have tidied those mods away to /dev/null since then.
>
Memory barriers will be reduced, because we're now only operating
on the single variable, rather than 2 (_count and flags), so we
don't need anything to order them (other than normal cache coherency).
Importantly, this will cut the smp_rmb() out of the speculative get,
which I suspect is why ia64 had slightly worse performance there.
Beautiful. (it will also cut out the smp_wmb()s and one atomic op out
of the write side).
>We disagreed over whether PageNoNewRefs usage in add_to_page_cache
>and __add_to_swap_cache was the same as in remove_mapping; but I
>think we agreed it could be avoided completely in those, just by
>being more careful about the ordering of the updates to struct page
>(I think it looked like the SetPageLocked needed to come earlier,
>but I forget the logic right now).
>
That's right. I have that in a followup patch, but in the interests
of keeping things small, I won't submit it for the first iteration.
>
>>+static inline int page_cache_get_speculative(struct page *page)
>>+{
>>+ VM_BUG_ON(in_interrupt());
>>+
>>+#ifndef CONFIG_SMP
>>+# ifdef CONFIG_PREEMPT
>>+ VM_BUG_ON(!in_atomic());
>>+# endif
>>+ /*
>>+ * Preempt must be disabled here - we rely on rcu_read_lock doing
>>+ * this for us.
>>+ *
>>+ * Pagecache won't be truncated from interrupt context, so if we have
>>+ * found a page in the radix tree here, we have pinned its refcount by
>>+ * disabling preempt, and hence no need for the "speculative get" that
>>+ * SMP requires.
>>+ */
>>+ VM_BUG_ON(page_count(page) == 0);
>>+ atomic_inc(&page->_count);
>>+
>>+#else
>>+ if (unlikely(!get_page_unless_zero(page)))
>>+ return 0; /* page has been freed */
>>
>
>This is the test which fails nicely whenever count is set to 0,
>whether because the page has been freed or because you wish to
>freeze it. But if you do make such a change, callers of
>page_cache_get_speculative may need to loop a little differently
>when it fails (the page might not be freed).
>
Yes, they can just retry -- in the case that the page had been freed,
they'll
find NULL in the radix tree slot on the next iteration: the 'return' here is
just a little shortcut.
>
>>+ VM_BUG_ON(PageCompound(page) && (struct page *)page_private(page) != page);
>>
>
>I found that VM_BUG_ON confusing, because it's only catching a tiny
>proportion of the cases you're interested in ruling out: most high
>order pages aren't PageCompound (but only the PageCompound ones offer
>that kind of check). If you really want to keep the check, I think
>it needs a comment to explain that; but I'd just delete the line.
>
But it is OK to take a spec ref to a higher order non compound page, because
they follow the same refcounting rules as if they are individual order-0
pages.
Compound pages should be OK, because their constituent pages will have a
count
of 0 and so will fail the previous test. This check was just for my own
satisfaction but I left it in as a form of commenting, however as you say I
should probably elaborate on that too.
Thank you thank you,
Nick
--
Send instant messages to your online friends http://au.messenger.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-09-25 2:00 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-22 19:22 [patch 0/4] lockless pagecache for 2.6.18-rc7-mm1 Nick Piggin
2006-09-22 19:22 ` [patch 2/4] radix-tree: use indirect bit Nick Piggin
2006-09-22 19:22 ` [patch 2/9] radix-tree: gang_lookup_slot Nick Piggin
2006-09-22 19:22 ` [patch 3/9] mm: speculative get page Nick Piggin
2006-09-23 10:01 ` Peter Zijlstra
2006-09-24 18:01 ` Hugh Dickins
2006-09-25 2:00 ` Nick Piggin [this message]
2006-09-25 11:47 ` Nick Piggin
2006-09-25 13:04 ` Peter Zijlstra
2006-09-25 22:41 ` Nick Piggin
2006-09-22 19:22 ` [patch 4/9] mm: lockless pagecache lookups Nick Piggin
2006-09-22 20:01 ` Lee Schermerhorn
2006-09-23 2:35 ` Nick Piggin
2006-09-22 19:24 ` [patch 0/4] lockless pagecache for 2.6.18-rc7-mm1 Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4517382E.8010308@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@osdl.org \
--cc=hugh@veritas.com \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox