linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>, Andrew Morton <akpm@osdl.org>,
	Linux Memory Management <linux-mm@kvack.org>
Subject: Re: [patch 3/9] mm: speculative get page
Date: Mon, 25 Sep 2006 15:04:13 +0200	[thread overview]
Message-ID: <1159189453.5018.25.camel@lappy> (raw)
In-Reply-To: <20060925114739.GA31148@wotan.suse.de>

On Mon, 2006-09-25 at 13:47 +0200, Nick Piggin wrote:

> +/*
> + * speculatively take a reference to a page.
> + * If the page is free (_count == 0), then _count is untouched, and 0
> + * is returned. Otherwise, _count is incremented by 1 and 1 is returned.
> + *
> + * This function must be run in the same rcu_read_lock() section as has
> + * been used to lookup the page in the pagecache radix-tree: this allows
> + * allocators to use a synchronize_rcu() to stabilize _count.
> + *
> + * Unless an RCU grace period has passed, the count of all pages coming out
> + * of the allocator must be considered unstable. page_count may return higher
> + * than expected, and put_page must be able to do the right thing when the
> + * page has been finished with (because put_page is what is used to drop an
> + * invalid speculative reference).
> + *
> + * This forms the core of the lockless pagecache locking protocol, where
> + * the lookup-side (eg. find_get_page) has the following pattern:
> + * 1. find page in radix tree
> + * 2. conditionally increment refcount
> + * 3. check the page is still in pagecache (if no, goto 1)
> + *
> + * Remove-side that cares about stability of _count (eg. reclaim) has the
                                   ^^^^^^^^^^^^^^^^^^^
is that the reason that the following two code paths are good without
change:

  truncate_inode_page_range()
   truncate_complete_page()
     remove_from_page_cache()
       radix_tree_delete()

and

  zap_pte_range()
    free_swap_and_cache()  <-- does check page_count()
      delete_from_swap_cache()
        __delete_from_swap_cache()
          radix_tree_delete()

>From the comments around the truncate bit it seems to be ok with keeping
the page as anonymous, however the zap_pte_range() thing does seem to
want to have a stable page_count().

> + * following (with tree_lock held for write):
> + * A. atomically check refcount is correct and set it to 0 (atomic_cmpxchg)
> + * B. remove page from pagecache
> + * C. free the page
> + *
> + * There are 2 critical interleavings that matter:
> + * - 2 runs before A: in this case, A sees elevated refcount and bails out
> + * - A runs before 2: in this case, 2 sees zero refcount and retries;
> + *   subsequently, B will complete and 1 will find no page, causing the
> + *   lookup to return NULL.
> + *
> + * It is possible that between 1 and 2, the page is removed then the exact same
> + * page is inserted into the same position in pagecache. That's OK: the
> + * old find_get_page using tree_lock could equally have run before or after
> + * such a re-insertion, depending on order that locks are granted.
> + *
> + * Lookups racing against pagecache insertion isn't a big problem: either 1
> + * will find the page or it will not. Likewise, the old find_get_page could run
> + * either before the insertion or afterwards, depending on timing.
> + */

Awesome code ;-)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-09-25 13:04 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-22 19:22 [patch 0/4] lockless pagecache for 2.6.18-rc7-mm1 Nick Piggin
2006-09-22 19:22 ` [patch 2/4] radix-tree: use indirect bit Nick Piggin
2006-09-22 19:22 ` [patch 2/9] radix-tree: gang_lookup_slot Nick Piggin
2006-09-22 19:22 ` [patch 3/9] mm: speculative get page Nick Piggin
2006-09-23 10:01   ` Peter Zijlstra
2006-09-24 18:01   ` Hugh Dickins
2006-09-25  2:00     ` Nick Piggin
2006-09-25 11:47       ` Nick Piggin
2006-09-25 13:04         ` Peter Zijlstra [this message]
2006-09-25 22:41           ` Nick Piggin
2006-09-22 19:22 ` [patch 4/9] mm: lockless pagecache lookups Nick Piggin
2006-09-22 20:01   ` Lee Schermerhorn
2006-09-23  2:35     ` Nick Piggin
2006-09-22 19:24 ` [patch 0/4] lockless pagecache for 2.6.18-rc7-mm1 Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1159189453.5018.25.camel@lappy \
    --to=a.p.zijlstra@chello.nl \
    --cc=akpm@osdl.org \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox