From: Andrea Arcangeli <andrea@suse.de>
To: Chuck Lever <cel@monkey.org>
Cc: "Stephen C. Tweedie" <sct@redhat.com>,
Kanoj Sarcar <kanoj@google.engr.sgi.com>,
torvalds@transmeta.com, linux-mm@kvack.org
Subject: Re: filecache/swapcache questions [RFC] [RFT] [PATCH] kanoj-mm12-2.3.8 Fix swapoff races
Date: Tue, 29 Jun 1999 00:48:18 +0200 (CEST) [thread overview]
Message-ID: <Pine.LNX.4.10.9906290032460.1588-100000@laser.random> (raw)
In-Reply-To: <Pine.BSO.4.10.9906281648010.24888-100000@funky.monkey.org>
On Mon, 28 Jun 1999, Chuck Lever wrote:
>that doesn't hurt because try_to_free_page() doesn't acquire anything but
>the kernel lock in my patch. it looks something like:
>
>int try_to_free_pages(unsigned int gfp_mask)
>{
> int priority = 6;
> int count = pager_daemon.swap_cluster;
>
> wake_up_process(kswapd_process);
>
> lock_kernel();
> do {
> while (shrink_mmap(priority, gfp_mask)) {
> if (!--count)
> goto done;
> }
>
> shrink_dcache_memory(priority, gfp_mask);
> } while (--priority >= 0);
>done:
> /* maybe slow this thread down while kswapd catches up */
> if (gfp_mask & __GFP_WAIT) {
> current->policy |= SCHED_YIELD;
> schedule();
> }
> unlock_kernel();
> return 1;
>}
How do you get the information about "when" to start the swap activities?
Maybe you have a separate try_to_free_pages() that does the plain-current
try_to_free_pages() and you call it only from kswapd?
My guess is that you'll end with zero cache and you'll have to page-in
from disk like h*ell when you reach swap with a resulting really bad
iteractive behaviour.
I think that being able to swapout from the process context is a very nice
feature because it cause the trashing task to block. This may looks not
very important with the current low_on_memory bit, but here I have a
per-task `trashing_memory' bitflag :).
Anyway we may re-implement recursive semaphores to avoid deadlocking into
the page fault path...
>the eventual goal of my adventure is to drop the kernel lock while doing
>the page COW in do_wp_page, since in 2.3.6+, the COW is again protected
>because of race conditions with kswapd. this "protection" serializes all
I thought a bit about that as well. I also coded a maybe possible
solution. Look at this snapshot:
Index: linux/mm/memory.c
===================================================================
RCS file: /var/cvs/linux/mm/memory.c,v
retrieving revision 1.1.1.10
retrieving revision 1.1.2.39
diff -u -r1.1.1.10 -r1.1.2.39
--- linux/mm/memory.c 1999/06/28 15:10:09 1.1.1.10
+++ linux/mm/memory.c 1999/06/28 17:08:59 1.1.2.39
@@ -607,16 +618,23 @@
struct page * page;
new_page = __get_free_page(GFP_USER);
- /* Did swap_out() unmap the protected page while we slept? */
- if (pte_val(*page_table) != pte_val(pte))
- goto end_wp_page;
old_page = pte_page(pte);
if (MAP_NR(old_page) >= max_mapnr)
goto bad_wp_page;
tsk->min_flt++;
page = mem_map + MAP_NR(old_page);
-
+
+ lock_page(page);
/*
+ * We can release the big kernel lock here since
+ * kswapd will see the page locked. -Andrea
+ */
+ unlock_kernel();
+ /* Did swap_out() unmap the protected page while we slept? */
+ if (pte_val(*page_table) != pte_val(pte))
+ goto end_wp_page;
+
+ /*
* We can avoid the copy if:
* - we're the only user (count == 1)
* - the only other user is the swap cache,
@@ -630,19 +648,15 @@
break;
if (swap_count(page->offset) != 1)
break;
+ lru_unmap_cache(page);
delete_from_swap_cache(page);
+ put_page_refcount(page);
/* FallThrough */
case 1:
flush_cache_page(vma, address);
set_pte(page_table, pte_mkdirty(pte_mkwrite(pte)));
flush_tlb_page(vma, address);
-end_wp_page:
- /*
- * We can release the kernel lock now.. Now swap_out will see
- * a dirty page and so won't get confused and flush_tlb_page
- * won't SMP race. -Andrea
- */
- unlock_kernel();
+ UnlockPage(page);
if (new_page)
free_page(new_page);
@@ -652,6 +666,7 @@
if (!new_page)
goto no_new_page;
+ lru_unmap_cache(page);
if (PageReserved(page))
++vma->vm_mm->rss;
copy_cow_page(old_page,new_page);
@@ -660,18 +675,26 @@
flush_cache_page(vma, address);
set_pte(page_table, pte_mkwrite(pte_mkdirty(mk_pte(new_page, vma->vm_page_prot))));
flush_tlb_page(vma, address);
- unlock_kernel();
+ UnlockPage(page);
__free_page(page);
return 1;
bad_wp_page:
+ unlock_kernel();
printk("do_wp_page: bogus page at address %08lx (%08lx)\n",address,old_page);
send_sig(SIGKILL, tsk, 1);
-no_new_page:
- unlock_kernel();
if (new_page)
free_page(new_page);
return 0;
+no_new_page:
+ UnlockPage(page);
+ oom(tsk);
+ return 0;
+end_wp_page:
+ UnlockPage(page);
+ if (new_page)
+ free_page(new_page);
+ return 1;
}
/*
It's only a partial snapshot, but it should show the picture. Basically I
am locking down the page with the lock held, then when I have the page
locked (I may sleep as well to lock it) I check if kswapd freed the
mapping or if I can go ahead without the big kernel lock. It basically
works but I had not the time to test it carefully yet.
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
next prev parent reply other threads:[~1999-06-28 22:48 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
1999-06-21 5:29 filecache/swapcache questions Kanoj Sarcar
1999-06-21 11:25 ` Stephen C. Tweedie
1999-06-21 16:46 ` Kanoj Sarcar
1999-06-21 16:57 ` Stephen C. Tweedie
1999-06-21 17:36 ` Kanoj Sarcar
1999-06-21 17:49 ` Stephen C. Tweedie
1999-06-21 18:46 ` Kanoj Sarcar
1999-06-21 23:44 ` Kanoj Sarcar
1999-06-24 22:23 ` Andrea Arcangeli
1999-06-24 23:55 ` Kanoj Sarcar
1999-06-25 0:26 ` Andrea Arcangeli
1999-06-28 1:48 ` filecache/swapcache questions [RFC] [RFT] [PATCH] kanoj-mm12-2.3.8 Fix swapoff races Kanoj Sarcar
1999-06-28 10:35 ` Andrea Arcangeli
1999-06-28 17:11 ` filecache/swapcache questions [RFC] [RFT] [PATCH] kanoj-mm12-2.3.8 Kanoj Sarcar
1999-06-28 16:32 ` filecache/swapcache questions [RFC] [RFT] [PATCH] kanoj-mm12-2.3.8 Fix swapoff races Stephen C. Tweedie
1999-06-28 17:25 ` Kanoj Sarcar
1999-06-28 20:40 ` Stephen C. Tweedie
1999-06-28 21:11 ` Kanoj Sarcar
1999-06-28 22:12 ` Stephen C. Tweedie
1999-06-28 23:43 ` Kanoj Sarcar
1999-06-29 11:44 ` Stephen C. Tweedie
1999-06-29 22:01 ` Kanoj Sarcar
1999-06-30 17:28 ` Stephen C. Tweedie
1999-06-30 18:05 ` Kanoj Sarcar
1999-06-28 19:39 ` Chuck Lever
1999-06-28 19:55 ` filecache/swapcache questions [RFC] [RFT] [PATCH] kanoj-mm12-2.3.8 Kanoj Sarcar
1999-06-28 20:33 ` Chuck Lever
1999-06-28 20:51 ` Kanoj Sarcar
1999-06-28 21:32 ` Chuck Lever
1999-06-28 21:38 ` Kanoj Sarcar
1999-06-28 21:50 ` Chuck Lever
1999-06-28 22:15 ` Kanoj Sarcar
1999-06-29 11:23 ` Stephen C. Tweedie
1999-06-29 17:36 ` Kanoj Sarcar
1999-06-28 22:22 ` Stephen C. Tweedie
1999-06-28 22:21 ` Stephen C. Tweedie
1999-06-28 22:57 ` Andrea Arcangeli
1999-06-29 2:13 ` Chuck Lever
1999-06-29 12:01 ` Stephen C. Tweedie
1999-06-29 12:32 ` Andrea Arcangeli
1999-06-30 15:59 ` Stephen C. Tweedie
1999-06-29 1:00 ` Chuck Lever
1999-06-28 22:08 ` Stephen C. Tweedie
1999-06-28 22:59 ` Andrea Arcangeli
1999-06-29 0:53 ` Chuck Lever
1999-06-29 11:14 ` Stephen C. Tweedie
1999-06-28 22:09 ` Stephen C. Tweedie
1999-06-28 20:45 ` filecache/swapcache questions [RFC] [RFT] [PATCH] kanoj-mm12-2.3.8 Fix swapoff races Stephen C. Tweedie
1999-06-28 21:14 ` Chuck Lever
1999-06-28 21:25 ` filecache/swapcache questions [RFC] [RFT] [PATCH] kanoj-mm12-2.3.8 Kanoj Sarcar
1999-06-28 22:15 ` filecache/swapcache questions [RFC] [RFT] [PATCH] kanoj-mm12-2.3.8 Fix swapoff races Stephen C. Tweedie
1999-06-28 22:48 ` Andrea Arcangeli [this message]
1999-06-29 1:29 ` Chuck Lever
1999-06-29 11:58 ` Stephen C. Tweedie
1999-06-29 12:09 ` Andrea Arcangeli
1999-06-29 15:27 ` Chuck Lever
1999-06-29 11:55 ` Stephen C. Tweedie
1999-06-29 20:08 ` Andrea Arcangeli
1999-06-28 22:36 ` filecache/swapcache questions Stephen C. Tweedie
1999-06-28 23:24 ` Kanoj Sarcar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.10.9906290032460.1588-100000@laser.random \
--to=andrea@suse.de \
--cc=cel@monkey.org \
--cc=kanoj@google.engr.sgi.com \
--cc=linux-mm@kvack.org \
--cc=sct@redhat.com \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox