From: "Christian Ehrhardt" <ehrhardt@mathematik.uni-ulm.de>
To: Andrew Morton <akpm@zip.com.au>
Cc: Daniel Phillips <phillips@arcor.de>,
lkml <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: MM patches against 2.5.31
Date: Mon, 26 Aug 2002 21:48:55 +0200 [thread overview]
Message-ID: <20020826194855.3641.qmail@thales.mathematik.uni-ulm.de> (raw)
In-Reply-To: <3D6A8082.3775C5AB@zip.com.au>
On Mon, Aug 26, 2002 at 12:24:50PM -0700, Andrew Morton wrote:
> The flaw is in doing the put_page_testzero() outside of any locking
Well, one could argue that doing the put_page_testzero outside of any
locking is a feature.
> [ ... ]
>
> 2.5.31-mm1 has tests which make this race enormously improbable [1],
> but it's still there.
Agreed. Both on the improbable and on the still there part.
> It's that `put' outside the lock which is the culprit. Normally, we
> handle that with atomic_dec_and_lock() (inodes) or by manipulating
> the refcount inside an area which has exclusion (page presence in
> pagecache).
>
> The sane, sensible and sucky way is to always take the lock:
>
> page_cache_release(page)
> {
> spin_lock(lru_lock);
> if (put_page_testzero(page)) {
> lru_cache_del(page);
> __free_pages_ok(page, 0);
> }
> spin_unlock(lru_lock);
> }
That would probably solve the problem.
> Because this provides exclusion from another CPU discovering the page
> via the LRU.
>
> So taking the above as the design principle, how can we speed it up?
> How to avoid taking the lock in every page_cache_release()? Maybe:
>
> page_cache_release(page)
> {
> if (page_count(page) == 1) {
> spin_lock(lru_lock);
> if (put_page_testzero(page)) {
> if (PageLRU(page))
> __lru_cache_del(page);
> __free_pages_ok(page);
> }
> spin_unlock(lru_lock);
> } else {
> atomic_dec(&page->count);
> }
> }
However, this is an incredibly bad idea if the page is NOT on the lru.
Think of two instances of page_cache_release racing against each other.
This could result in a leaked page which is not on the LRU.
> This is nice and quick, but racy. Two concurrent page_cache_releases
> will create a zero-ref unfreed page which is on the LRU. These are
> rare, and can be mopped up in page reclaim.
>
> The above code will also work for pages which aren't on the LRU. It will
> take the lock unnecessarily for (say) slab pages. But if we put slab pages
> on the LRU then I suspect there are so few non-LRU pages left that it isn't
> worth bothering about this.
No it will not work. See above.
> [1] The race requires that the CPU running page_cache_release find a
> five instruction window against the CPU running shrink_cache. And
> that they be operating against the same page. And that the CPU
> running __page_cache_release() then take an interrupt in a 3-4
> instruction window. And that the interrupt take longer than the
> runtime for shrink_list. And that the page be the first page in
> the pagevec.
The interrupt can also be a preemption which might easily take long
enough. But I agree that the race is now rare. The real problem is
that the locking rules don't guarantee that there are no other racy
paths that we both missed.
regards Christian
--
THAT'S ALL FOLKS!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next prev parent reply other threads:[~2002-08-26 19:48 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-08-22 2:29 Andrew Morton
2002-08-22 11:28 ` Christian Ehrhardt
2002-08-26 1:52 ` Andrew Morton
2002-08-26 9:10 ` Christian Ehrhardt
2002-08-26 14:22 ` Daniel Phillips
2002-08-26 15:29 ` Christian Ehrhardt
2002-08-26 17:56 ` Daniel Phillips
2002-08-26 19:24 ` Andrew Morton
2002-08-26 19:34 ` Daniel Phillips
2002-08-26 19:48 ` Christian Ehrhardt [this message]
2002-08-27 9:22 ` Christian Ehrhardt
2002-08-27 19:19 ` Andrew Morton
2002-08-26 20:00 ` Christian Ehrhardt
2002-08-26 20:09 ` Daniel Phillips
2002-08-26 20:58 ` Christian Ehrhardt
2002-08-27 16:48 ` Daniel Phillips
2002-08-28 13:14 ` Christian Ehrhardt
2002-08-28 17:18 ` Daniel Phillips
2002-08-28 17:42 ` Andrew Morton
2002-08-28 20:41 ` Daniel Phillips
2002-08-28 21:03 ` Andrew Morton
2002-08-28 22:04 ` Daniel Phillips
2002-08-28 22:39 ` Andrew Morton
2002-08-28 22:57 ` Daniel Phillips
2002-08-26 21:31 ` Andrew Morton
2002-08-27 3:42 ` Benjamin LaHaise
2002-08-27 4:37 ` Andrew Morton
2002-08-22 15:59 ` Steven Cole
2002-08-22 16:06 ` Martin J. Bligh
2002-08-22 19:45 ` Steven Cole
2002-08-26 2:15 ` Andrew Morton
2002-08-26 2:08 ` Martin J. Bligh
2002-08-26 2:32 ` Andrew Morton
2002-08-26 3:06 ` Steven Cole
2002-08-26 22:09 Ed Tomlinson
2002-08-26 23:58 ` Andrew Morton
2002-08-27 0:13 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020826194855.3641.qmail@thales.mathematik.uni-ulm.de \
--to=ehrhardt@mathematik.uni-ulm.de \
--cc=akpm@zip.com.au \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=phillips@arcor.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox