linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* A couple of questions
@ 1999-03-02 13:11 Neil Booth
  1999-03-15 18:58 ` Stephen C. Tweedie
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Booth @ 1999-03-02 13:11 UTC (permalink / raw)
  To: linux-mm

I have a couple of questions about do_wp_page; I hope they're welcome
here.

1) do_wp_page has most execution paths doing an unlock_kernel() but
there are a couple that don't. Why isn't this inconsistent? e.g. any of
the branches that call end_wp_page do not unlock the kernel. What am I
missing? Is it that these branches only happen if we slept while getting
the free page, and sleeping always unlocks the kernel?

2) The last 2 of the 3 branches to end_wp_page seem to me to be
impossible code paths.

	if (!pte_present(pte))
		goto end_wp_page;
	if (pte_write(pte))
		goto end_wp_page;

At entry, pte (= *page_table) is present and not writable as this is the
only way do_wp_page gets called from handle_pte_fault (and we hold the
kernel lock so nothing else can change *page_table). Being a local
variable, it contents cannot change, so why these 2 tests?

Cheers,

Neil.
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A couple of questions
  1999-03-02 13:11 A couple of questions Neil Booth
@ 1999-03-15 18:58 ` Stephen C. Tweedie
  1999-03-15 22:46   ` neil
  1999-03-16  2:11   ` Andrea Arcangeli
  0 siblings, 2 replies; 5+ messages in thread
From: Stephen C. Tweedie @ 1999-03-15 18:58 UTC (permalink / raw)
  To: Neil Booth; +Cc: linux-mm, Stephen Tweedie

Hi,

<Late answer: I've been offline for a couple of weeks>

On Tue, 02 Mar 1999 22:11:45 +0900, Neil Booth <NeilB@earthling.net> said:

> I have a couple of questions about do_wp_page; I hope they're welcome
> here.

> 1) do_wp_page has most execution paths doing an unlock_kernel() but
> there are a couple that don't. Why isn't this inconsistent? 

Good question, and a possible bug.  Anyone else care to glance at this?
It's a possible problem only on SMP, of course.  The obvious fix is:

----------------------------------------------------------------
--- mm/memory.c~	Tue Jan 19 01:33:10 1999
+++ mm/memory.c	Mon Mar 15 18:57:31 1999
@@ -651,13 +651,13 @@
 		delete_from_swap_cache(page_map);
 		/* FallThrough */
 	case 1:
-		/* We can release the kernel lock now.. */
-		unlock_kernel();
-
 		flush_cache_page(vma, address);
 		set_pte(page_table, pte_mkdirty(pte_mkwrite(pte)));
 		flush_tlb_page(vma, address);
 end_wp_page:
+		/* We can release the kernel lock now.. */
+		unlock_kernel();
+
 		if (new_page)
 			free_page(new_page);
 		return 1;
----------------------------------------------------------------

> 2) The last 2 of the 3 branches to end_wp_page seem to me to be
> impossible code paths.

> 	if (!pte_present(pte))
> 		goto end_wp_page;
> 	if (pte_write(pte))
> 		goto end_wp_page;

No, the start of do_wp_page() looks like:

	pte = *page_table;
	new_page = __get_free_page(GFP_USER);

and the get_free_page() call can block if we are out of memory, dropping
the kernel lock in the process.  The page table can be modified by
kswapd during this interval.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A couple of questions
  1999-03-15 18:58 ` Stephen C. Tweedie
@ 1999-03-15 22:46   ` neil
  1999-03-16 12:22     ` Stephen C. Tweedie
  1999-03-16  2:11   ` Andrea Arcangeli
  1 sibling, 1 reply; 5+ messages in thread
From: neil @ 1999-03-15 22:46 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Linux-MM

Hi Stephen,

Stephen C. Tweedie wrote:-
> Hi,
> 
[..snip..]
>
> > 2) The last 2 of the 3 branches to end_wp_page seem to me to be
> > impossible code paths.
> 
> > 	if (!pte_present(pte))
> > 		goto end_wp_page;
> > 	if (pte_write(pte))
> > 		goto end_wp_page;
> 
> No, the start of do_wp_page() looks like:
> 
> 	pte = *page_table;
> 	new_page = __get_free_page(GFP_USER);
> 
> and the get_free_page() call can block if we are out of memory, dropping
> the kernel lock in the process.  The page table can be modified by
> kswapd during this interval.

Thanks for your reply.  I think you've missed my point on this one.
The variable "pte" is set before calling __get_free_page(), and being
local cannot be modified by other processes.  Hence I still believe
the 2 branches shown are impossible, their negative having been the
condition for entering do_wp_page().

The case you mention is captured by the initial test

	if (pte_val(*page_table) != pte_val(pte))
		goto end_wp_page;

performed before the two above.  Do you agree?

Cheers,

Neil.
-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A couple of questions
  1999-03-15 18:58 ` Stephen C. Tweedie
  1999-03-15 22:46   ` neil
@ 1999-03-16  2:11   ` Andrea Arcangeli
  1 sibling, 0 replies; 5+ messages in thread
From: Andrea Arcangeli @ 1999-03-16  2:11 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Neil Booth, linux-mm, Linus Torvalds

On Mon, 15 Mar 1999, Stephen C. Tweedie wrote:

>--- mm/memory.c~	Tue Jan 19 01:33:10 1999
>+++ mm/memory.c	Mon Mar 15 18:57:31 1999
>@@ -651,13 +651,13 @@
> 		delete_from_swap_cache(page_map);
> 		/* FallThrough */
> 	case 1:
>-		/* We can release the kernel lock now.. */
>-		unlock_kernel();
>-
> 		flush_cache_page(vma, address);
> 		set_pte(page_table, pte_mkdirty(pte_mkwrite(pte)));
> 		flush_tlb_page(vma, address);
> end_wp_page:
>+		/* We can release the kernel lock now.. */
>+		unlock_kernel();
>+
> 		if (new_page)
> 			free_page(new_page);
> 		return 1;
>----------------------------------------------------------------

Your sure safe patch is strictly needed according to me in order to
release the lock_kernel in the end_wp_page path.

The reason I think it's just safe remove the lock_kernel before updating
the page table of the process is because the swap_out engine will do
nothing with the page until it will be a clean page (and should be clean
because it was read-only in first place.... am I really right here?).
Every other part of the VM will block on the semaphore so it won't race
anyway with the page fault handler.

I think this patch against 2.2.3 looks needed to me (except the first
chunk that is only removing superflous code).

Seems to works fine after some minute of stress-testing.

Index: mm//memory.c
===================================================================
RCS file: /var/cvs/linux/mm/memory.c,v
retrieving revision 1.1.2.3
diff -u -r1.1.2.3 memory.c
--- memory.c	1999/01/24 02:46:31	1.1.2.3
+++ linux/mm/memory.c	1999/03/16 01:55:45
@@ -624,10 +624,6 @@
 	/* Did someone else copy this page for us while we slept? */
 	if (pte_val(*page_table) != pte_val(pte))
 		goto end_wp_page;
-	if (!pte_present(pte))
-		goto end_wp_page;
-	if (pte_write(pte))
-		goto end_wp_page;
 	old_page = pte_page(pte);
 	if (MAP_NR(old_page) >= max_mapnr)
 		goto bad_wp_page;
@@ -651,13 +647,18 @@
 		delete_from_swap_cache(page_map);
 		/* FallThrough */
 	case 1:
-		/* We can release the kernel lock now.. */
+		/*
+		 * We can release the kernel lock now.. because the swap_out
+		 * engine will do nothing with the page table until it
+		 * will be a clean page (and we are sure it's clean because it
+		 * wasn't writable yet). All other parts of the VM will
+		 * stop on the mmap semaphore. -arca
+		 */
 		unlock_kernel();
 
 		flush_cache_page(vma, address);
 		set_pte(page_table, pte_mkdirty(pte_mkwrite(pte)));
 		flush_tlb_page(vma, address);
-end_wp_page:
 		if (new_page)
 			free_page(new_page);
 		return 1;
@@ -681,9 +682,15 @@
 bad_wp_page:
 	printk("do_wp_page: bogus page at address %08lx (%08lx)\n",address,old_page);
 	send_sig(SIGKILL, tsk, 1);
+	unlock_kernel();
 	if (new_page)
 		free_page(new_page);
 	return 0;
+end_wp_page:
+	unlock_kernel();
+	if (new_page)
+		free_page(new_page);
+	return 1;
 }
 
 /*



Andrea Arcangeli


--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A couple of questions
  1999-03-15 22:46   ` neil
@ 1999-03-16 12:22     ` Stephen C. Tweedie
  0 siblings, 0 replies; 5+ messages in thread
From: Stephen C. Tweedie @ 1999-03-16 12:22 UTC (permalink / raw)
  To: neil; +Cc: Stephen C. Tweedie, Linux-MM

Hi,

On Tue, 16 Mar 1999 07:46:06 +0900, neil@tc-1-192.ariake.gol.ne.jp
said:

> Thanks for your reply.  I think you've missed my point on this one.
> The variable "pte" is set before calling __get_free_page(), and being
> local cannot be modified by other processes.  

Umm, OK, you've convinced me. :) I think we have enough locks held
throughout this to prevent the present or writable bits in *page_table
from changing between the test in handle_pte_fault() and do_wp_page()
itself, even on SMP.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~1999-03-16 12:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-03-02 13:11 A couple of questions Neil Booth
1999-03-15 18:58 ` Stephen C. Tweedie
1999-03-15 22:46   ` neil
1999-03-16 12:22     ` Stephen C. Tweedie
1999-03-16  2:11   ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox