From: Andrea Arcangeli <andrea@suse.de>
To: Linus Torvalds <torvalds@transmeta.com>,
MOLNAR Ingo <mingo@redhat.com>,
"David S. Miller" <davem@redhat.com>
Cc: linux-kernel@vger.rutgers.edu, linux-mm@kvack.org
Subject: Re: [patch] minimal page-LRU
Date: Wed, 4 Aug 1999 19:07:33 +0200 (CEST) [thread overview]
Message-ID: <Pine.LNX.4.10.9908041725380.401-100000@laser.random> (raw)
In-Reply-To: <Pine.LNX.4.10.9908041310460.2739-100000@laser.random>
On Wed, 4 Aug 1999, Andrea Arcangeli wrote:
>tried to stay in sync as much as possible with the current 2.3.12 VM. The
>patch should be the _shorter_ as possible and it should be safe as far as
>the 2.3.12 VM is safe.
Continuing thinking about the code I just found a race condition between
the page-LRU shrink_mmap and truncate_inode_pages() ;).
The SMP race looks like this:
shrink_mmap cpu 0 truncate_inode_pages cpu 1
----------------- --------------------------
- remove page from lru list
- release lru lock
- grab pagecache_lock
- increase page count
- lockdown the page
- flushpage and remove
from lru list but the
page was not present
any lru lists!!
To fix this race I must simply avoid to release the pagemap_lru lock
if (the page is not queued in any lru list &&
I have not yet locked down the page)
Once I have locked down the page I can also safely relase the lru lock
even if the page is not yet queued in any lru list, since
truncate_inode_pages() will wait me in lock_page() (and if
truncate_inode_pages() is waiting me I'll give up in the page_count()
check that it's done atomically with the pagecache_lock held).
I also need to access the local lru queues with the spinlock held since
truncate_inode_page() may remove the page from my local lru queue while I
am working on the global lru lists.
This fix is incremental with the minimal lru patch against 2.3.12 I posted
some hours ago.
--- 2.3.12-lru1/mm/filemap.c Wed Aug 4 18:35:59 1999
+++ 2.3.12-lru/mm/filemap.c Wed Aug 4 18:37:51 1999
@@ -238,35 +238,42 @@
page = list_entry(page_lru, struct page, lru);
list_del(page_lru);
+ dispose = lru;
if (test_and_clear_bit(PG_referenced, &page->flags))
- {
/* Roll the page at the top of the lru list,
* we could also be more aggressive putting
* the page in the young-dispose-list, so
* avoiding to free young pages in each pass.
*/
- list_add(page_lru, lru);
- continue;
- }
- spin_unlock(&pagemap_lru_lock);
+ goto dispose_continue;
dispose = &old;
+ /* don't account passes over not DMA pages */
if ((gfp_mask & __GFP_DMA) && !PageDMA(page))
goto dispose_continue;
(*count)--;
dispose = &young;
+ if (TryLockPage(page))
+ goto dispose_continue;
+
+ /* Release the pagemap_lru lock even if the page is not yet
+ queued in any lru queue since we have just locked down
+ the page so nobody else may SMP race with us running
+ a lru_cache_del() (lru_cache_del() always run with the
+ page locked down ;). */
+ spin_unlock(&pagemap_lru_lock);
+
/* avoid unscalable SMP locking */
if (!page->buffers && page_count(page) > 1)
- goto dispose_continue;
+ goto unlock_noput_continue;
+ /* Take the pagecache_lock spinlock held to avoid
+ other tasks to notice the page while we are looking at its
+ page count. If it's a pagecache-page we'll free it
+ in one atomic transaction after checking its page count. */
spin_lock(&pagecache_lock);
- if (TryLockPage(page))
- {
- spin_unlock(&pagecache_lock);
- goto dispose_continue;
- }
/* avoid freeing the page while it's locked */
get_page(page);
@@ -326,11 +333,21 @@
unlock_continue:
UnlockPage(page);
put_page(page);
+dispose_relock_continue:
+ /* even if the dispose list is local, a truncate_inode_page()
+ may remove a page from its queue so always
+ synchronize with the lru lock while accesing the
+ page->lru field */
+ spin_lock(&pagemap_lru_lock);
+ list_add(page_lru, dispose);
+ continue;
+
+unlock_noput_continue:
+ UnlockPage(page);
+ goto dispose_relock_continue;
+
dispose_continue:
- /* no need of the spinlock to play with the
- local dispose lists */
list_add(page_lru, dispose);
- spin_lock(&pagemap_lru_lock);
}
goto out;
Andrea
PS. I also added some more comment ;).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
next prev parent reply other threads:[~1999-08-04 17:07 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
1999-08-04 11:33 Andrea Arcangeli
1999-08-04 11:50 ` Ingo Molnar
1999-08-04 17:07 ` Andrea Arcangeli [this message]
1999-08-04 18:05 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.10.9908041725380.401-100000@laser.random \
--to=andrea@suse.de \
--cc=davem@redhat.com \
--cc=linux-kernel@vger.rutgers.edu \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox