* Page Migration: Make do_swap_page redo the fault
@ 2006-04-04 5:33 Christoph Lameter
2006-04-08 12:16 ` Hugh Dickins
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Lameter @ 2006-04-04 5:33 UTC (permalink / raw)
To: akpm; +Cc: linux-mm
It is better to redo the complete fault if do_swap_page() finds
that the page is not in PageSwapCache() because the page migration
code may have replaced the swap pte already with a pte pointing
to valid memory.
do_swap_page may interpret an invalid swap entry without this patch
because we do not reload the pte if we are looping back. The page
migration code may already have reused the swap entry referenced by our
local swp_entry.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.17-rc1/mm/memory.c
===================================================================
--- linux-2.6.17-rc1.orig/mm/memory.c 2006-04-02 20:22:10.000000000 -0700
+++ linux-2.6.17-rc1/mm/memory.c 2006-04-03 22:22:56.000000000 -0700
@@ -1879,7 +1879,6 @@ static int do_swap_page(struct mm_struct
goto out;
entry = pte_to_swp_entry(orig_pte);
-again:
page = lookup_swap_cache(entry);
if (!page) {
swapin_readahead(entry, address, vma);
@@ -1907,7 +1906,7 @@ again:
/* Page migration has occured */
unlock_page(page);
page_cache_release(page);
- goto again;
+ goto out;
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Page Migration: Make do_swap_page redo the fault
2006-04-04 5:33 Page Migration: Make do_swap_page redo the fault Christoph Lameter
@ 2006-04-08 12:16 ` Hugh Dickins
2006-04-08 18:25 ` Christoph Lameter
0 siblings, 1 reply; 10+ messages in thread
From: Hugh Dickins @ 2006-04-08 12:16 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-mm
On Mon, 3 Apr 2006, Christoph Lameter wrote:
> It is better to redo the complete fault if do_swap_page() finds
> that the page is not in PageSwapCache() because the page migration
> code may have replaced the swap pte already with a pte pointing
> to valid memory.
>
> do_swap_page may interpret an invalid swap entry without this patch
> because we do not reload the pte if we are looping back. The page
> migration code may already have reused the swap entry referenced by our
> local swp_entry.
Wouldn't you better just remove that !PageSwapCache "Page migration has
occured" block? Isn't that case already dealt with by the old !pte_same
check below it?
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Page Migration: Make do_swap_page redo the fault
2006-04-08 12:16 ` Hugh Dickins
@ 2006-04-08 18:25 ` Christoph Lameter
2006-04-08 19:26 ` Hugh Dickins
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Lameter @ 2006-04-08 18:25 UTC (permalink / raw)
To: Hugh Dickins; +Cc: akpm, linux-mm
On Sat, 8 Apr 2006, Hugh Dickins wrote:
> > do_swap_page may interpret an invalid swap entry without this patch
> > because we do not reload the pte if we are looping back. The page
> > migration code may already have reused the swap entry referenced by our
> > local swp_entry.
>
> Wouldn't you better just remove that !PageSwapCache "Page migration has
> occured" block? Isn't that case already dealt with by the old !pte_same
> check below it?
Right. Since we now replace the swap ptes with ptes pointing to pages
before unlocking the page this is no longer necessary (if the ptes
contents are checked later). That of course means that remove_from_swap()
must always succeed.
Hmmm..,. There are still two other checks for !PageSwapCache after
obtaining a page lock in shmem_getpage() and in try_to_unuse().
However, both are getting to the page via the swap maps. So we need to
keep those.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c 2006-04-02 21:55:26.000000000 -0700
+++ linux-2.6/mm/memory.c 2006-04-08 11:08:33.000000000 -0700
@@ -1903,12 +1903,6 @@ again:
mark_page_accessed(page);
lock_page(page);
- if (!PageSwapCache(page)) {
- /* Page migration has occured */
- unlock_page(page);
- page_cache_release(page);
- goto again;
- }
/*
* Back out if somebody else already faulted in this pte.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Page Migration: Make do_swap_page redo the fault
2006-04-08 18:25 ` Christoph Lameter
@ 2006-04-08 19:26 ` Hugh Dickins
2006-04-08 21:39 ` Christoph Lameter
0 siblings, 1 reply; 10+ messages in thread
From: Hugh Dickins @ 2006-04-08 19:26 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-mm
On Sat, 8 Apr 2006, Christoph Lameter wrote:
>
> Hmmm..,. There are still two other checks for !PageSwapCache after
> obtaining a page lock in shmem_getpage() and in try_to_unuse().
> However, both are getting to the page via the swap maps. So we need to
> keep those.
Sure, those are long standing checks, necessary long before migration
came on the scene; whereas the check in do_swap_page was recently added
just for a page migration case, and now turns out to be redundant.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Page Migration: Make do_swap_page redo the fault
2006-04-08 19:26 ` Hugh Dickins
@ 2006-04-08 21:39 ` Christoph Lameter
2006-04-09 3:11 ` Hugh Dickins
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Lameter @ 2006-04-08 21:39 UTC (permalink / raw)
To: Hugh Dickins; +Cc: akpm, linux-mm
On Sat, 8 Apr 2006, Hugh Dickins wrote:
> On Sat, 8 Apr 2006, Christoph Lameter wrote:
> >
> > Hmmm..,. There are still two other checks for !PageSwapCache after
> > obtaining a page lock in shmem_getpage() and in try_to_unuse().
> > However, both are getting to the page via the swap maps. So we need to
> > keep those.
>
> Sure, those are long standing checks, necessary long before migration
> came on the scene; whereas the check in do_swap_page was recently added
> just for a page migration case, and now turns out to be redundant.
Those two checks were added for migration together with the one we
are removing now. Sounds like you think they additionally fix some other
race conditions?
The check we are discussing only becomes unnecessary if the swap ptes are
replaced by regular ptes. The swap pte would refer to the old page from
which the SwapCache bit was cleared. This is dependent on remove_from_swap
always functioning properly which happened pretty late in the 2.6.16
cycle.
Here is the description from V9 of the direct migration patchset which
introduced the 3 checks for PageSwapCache():
Check for PageSwapCache after looking up and locking a swap page.
The page migration code may change a swap pte to point to a different page
under lock_page().
If that happens then the vm must retry the lookup operation in the swap
space to find the correct page number. There are a couple of locations
in the VM where a lock_page() is done on a swap page. In these locations
we need to check afterwards if the page was migrated. If the page was
migrated
then the old page that was looked up before was freed and no longer has
the
PageSwapCache bit set.
Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Christoph Lameter <clameter@@sgi.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Page Migration: Make do_swap_page redo the fault
2006-04-08 21:39 ` Christoph Lameter
@ 2006-04-09 3:11 ` Hugh Dickins
2006-04-10 18:54 ` Hugh Dickins
0 siblings, 1 reply; 10+ messages in thread
From: Hugh Dickins @ 2006-04-09 3:11 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-mm
On Sat, 8 Apr 2006, Christoph Lameter wrote:
> On Sat, 8 Apr 2006, Hugh Dickins wrote:
> >
> > Sure, those are long standing checks, necessary long before migration
> > came on the scene; whereas the check in do_swap_page was recently added
> > just for a page migration case, and now turns out to be redundant.
>
> Those two checks were added for migration together with the one we
> are removing now. Sounds like you think they additionally fix some other
> race conditions?
Of course, you're right - sorry. Whatever was I looking at,
to get it so confidently wrong? Dunno: scary.
But I do have to worry then. I'd missed the addition of those checks:
if they really are necessary, then the rules have changed in two
tricky areas I now need to re-understand. It'll take me a while.
Thanks for setting me straight.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Page Migration: Make do_swap_page redo the fault
2006-04-09 3:11 ` Hugh Dickins
@ 2006-04-10 18:54 ` Hugh Dickins
2006-04-10 20:19 ` Christoph Lameter
0 siblings, 1 reply; 10+ messages in thread
From: Hugh Dickins @ 2006-04-10 18:54 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linux-mm
On Sun, 9 Apr 2006, Hugh Dickins wrote:
> On Sat, 8 Apr 2006, Christoph Lameter wrote:
> >
> > Those two checks were added for migration together with the one we
> > are removing now. Sounds like you think they additionally fix some other
> > race conditions?
>
> But I do have to worry then. I'd missed the addition of those checks:
> if they really are necessary, then the rules have changed in two
> tricky areas I now need to re-understand. It'll take me a while.
I have now checked through, and I'm relieved to conclude that neither
of those other two PageSwapCache rechecks are necessary; and the rules
are much as before.
In the try_to_unuse case, it's quite possible that !PageSwapCache there,
because of a racing delete_from_swap_cache; but that case is correctly
handled in the code that follows.
In the shmem_getpage case, info->lock is held to ensure that a racing
shmem_getpage or shmem_unuse_inode can't change it to !PageSwapCache.
In neither case can page migration interfere, because we're holding a
reference on the page: acquired within find_get_page's tree_lock (or
in the initial page allocation before add_to_swap_cache).
migrate_page_remove_references is careful to check page_count against
nr_refs within the tree_lock, and back out if page_count is raised.
If it didn't do so, most uses of find_get_page would be unsafe.
So I believe we can safely remove these other two
"Page migration has occured" blocks - can't we?
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Page Migration: Make do_swap_page redo the fault
2006-04-10 18:54 ` Hugh Dickins
@ 2006-04-10 20:19 ` Christoph Lameter
2006-04-11 14:58 ` Lee Schermerhorn
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Lameter @ 2006-04-10 20:19 UTC (permalink / raw)
To: Hugh Dickins; +Cc: akpm, Lee Schermerhorn, linux-mm
On Mon, 10 Apr 2006, Hugh Dickins wrote:
> I have now checked through, and I'm relieved to conclude that neither
> of those other two PageSwapCache rechecks are necessary; and the rules
> are much as before.
Note that the removal of the check in do_swap_page does only work
since the remove_from_swap() changes the pte. Without that pte change
do_swap_page could retrieve the old page via the swap map. It would wait
until page migration finished its migration and then find that the page is
not in the pagecache anymore. Note that Lee Schermerhorn's lazy page
migration may rely on disabling remove_from_swap() for his migration
scheme. Lee? Looks like we are putting new barriers in front of you?
> In the try_to_unuse case, it's quite possible that !PageSwapCache there,
> because of a racing delete_from_swap_cache; but that case is correctly
> handled in the code that follows.
Ah. I see a later check
if ((*swap_map > 1) && PageDirty(page) && PageSwapCache(page)) {
> So I believe we can safely remove these other two
> "Page migration has occured" blocks - can't we?
Hmmm... The increased count is also an argument against having to check
for the race in do_swap_page(). So maybe Lee's lazy migration patchset
should also be fine without these checks and there is actually no need
to rely on the ptes not being the same.
Remove two unnecessary PageSwapCache checks. The page refcount is raised
and therefore page migration cannot occur in both functions.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6/mm/shmem.c
===================================================================
--- linux-2.6.orig/mm/shmem.c 2006-04-08 11:14:20.000000000 -0700
+++ linux-2.6/mm/shmem.c 2006-04-10 13:13:43.000000000 -0700
@@ -1079,14 +1079,6 @@ repeat:
page_cache_release(swappage);
goto repeat;
}
- if (!PageSwapCache(swappage)) {
- /* Page migration has occured */
- shmem_swp_unmap(entry);
- spin_unlock(&info->lock);
- unlock_page(swappage);
- page_cache_release(swappage);
- goto repeat;
- }
if (PageWriteback(swappage)) {
shmem_swp_unmap(entry);
spin_unlock(&info->lock);
Index: linux-2.6/mm/swapfile.c
===================================================================
--- linux-2.6.orig/mm/swapfile.c 2006-04-02 21:55:26.000000000 -0700
+++ linux-2.6/mm/swapfile.c 2006-04-10 13:13:01.000000000 -0700
@@ -751,12 +751,6 @@ again:
wait_on_page_locked(page);
wait_on_page_writeback(page);
lock_page(page);
- if (!PageSwapCache(page)) {
- /* Page migration has occured */
- unlock_page(page);
- page_cache_release(page);
- goto again;
- }
wait_on_page_writeback(page);
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Page Migration: Make do_swap_page redo the fault
2006-04-10 20:19 ` Christoph Lameter
@ 2006-04-11 14:58 ` Lee Schermerhorn
2006-04-11 15:55 ` Christoph Lameter
0 siblings, 1 reply; 10+ messages in thread
From: Lee Schermerhorn @ 2006-04-11 14:58 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Hugh Dickins, akpm, linux-mm
On Mon, 2006-04-10 at 13:19 -0700, Christoph Lameter wrote:
> On Mon, 10 Apr 2006, Hugh Dickins wrote:
>
> > I have now checked through, and I'm relieved to conclude that neither
> > of those other two PageSwapCache rechecks are necessary; and the rules
> > are much as before.
>
> Note that the removal of the check in do_swap_page does only work
> since the remove_from_swap() changes the pte. Without that pte change
> do_swap_page could retrieve the old page via the swap map. It would wait
> until page migration finished its migration and then find that the page is
> not in the pagecache anymore. Note that Lee Schermerhorn's lazy page
> migration may rely on disabling remove_from_swap() for his migration
> scheme. Lee? Looks like we are putting new barriers in front of you?
Yes. I noticed. If the current code doesn't depend on these check, I
guess you should probably rip 'em out and I'll carry any necessary check
in my series.
>
> > In the try_to_unuse case, it's quite possible that !PageSwapCache there,
> > because of a racing delete_from_swap_cache; but that case is correctly
> > handled in the code that follows.
>
> Ah. I see a later check
>
> if ((*swap_map > 1) && PageDirty(page) && PageSwapCache(page)) {
>
> > So I believe we can safely remove these other two
> > "Page migration has occured" blocks - can't we?
>
> Hmmm... The increased count is also an argument against having to check
> for the race in do_swap_page(). So maybe Lee's lazy migration patchset
> should also be fine without these checks and there is actually no need
> to rely on the ptes not being the same.
May still be some work in do_swap_page(). The unmap has already
occurred. In the general case [support for migrating pages w/ > 1 pte
mapping], two or more tasks could race faulting the cache pte. IMO one
should perform the migration [replacing old page in cache with new
page], others should block and then use the new page to resolve their
own faults. I think this means a check and then at least another cache
lookup. Maybe redo the fault, as Christoph has said.
Don't know about direct migration.
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Page Migration: Make do_swap_page redo the fault
2006-04-11 14:58 ` Lee Schermerhorn
@ 2006-04-11 15:55 ` Christoph Lameter
0 siblings, 0 replies; 10+ messages in thread
From: Christoph Lameter @ 2006-04-11 15:55 UTC (permalink / raw)
To: Lee Schermerhorn; +Cc: Hugh Dickins, akpm, linux-mm
On Tue, 11 Apr 2006, Lee Schermerhorn wrote:
> > Hmmm... The increased count is also an argument against having to check
> > for the race in do_swap_page(). So maybe Lee's lazy migration patchset
> > should also be fine without these checks and there is actually no need
> > to rely on the ptes not being the same.
>
> May still be some work in do_swap_page(). The unmap has already
> occurred. In the general case [support for migrating pages w/ > 1 pte
> mapping], two or more tasks could race faulting the cache pte. IMO one
> should perform the migration [replacing old page in cache with new
> page], others should block and then use the new page to resolve their
> own faults. I think this means a check and then at least another cache
> lookup. Maybe redo the fault, as Christoph has said.
>
> Don't know about direct migration.
In direct migration the reference counts avoid these problems.
The worst case scenario that we have imagined so far assumed that
migration occurs after do_swap_cache has done a lookup of the old page in
the swap_cache. The check that we had in there was envisioned to protect
against the case of migration happening after the lookup_swap_cache()
occurred.
As Hugh noted: lookup_swap_cache() is increasing the page count under
tree_lock. direct migration also checks page count under the tree lock.
So migration cannot occur after the lookup_swap_cache() has been done and
returned the old page.
If lookup_swap_cache returns the new page then we will block on
lock_page() until migration has finished.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-04-11 15:55 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-04 5:33 Page Migration: Make do_swap_page redo the fault Christoph Lameter
2006-04-08 12:16 ` Hugh Dickins
2006-04-08 18:25 ` Christoph Lameter
2006-04-08 19:26 ` Hugh Dickins
2006-04-08 21:39 ` Christoph Lameter
2006-04-09 3:11 ` Hugh Dickins
2006-04-10 18:54 ` Hugh Dickins
2006-04-10 20:19 ` Christoph Lameter
2006-04-11 14:58 ` Lee Schermerhorn
2006-04-11 15:55 ` Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox