* page_lock_anon_vma(): remove check for mapped page @ 2006-02-25 1:03 Christoph Lameter 2006-02-25 14:01 ` Hugh Dickins 0 siblings, 1 reply; 16+ messages in thread From: Christoph Lameter @ 2006-02-25 1:03 UTC (permalink / raw) To: akpm; +Cc: linux-mm, Hugh Dickins Any reason that this function is checking for a mapped page? There could be references through a swap pte to the page. The looping in remove_from_swap, page_referenced_anon and try_to_unmap anon would work even if the check for a mapped page would be removed. I have sent the patch below today to Hugh Dickins but did not receive an answer. Probaby requires some discussion. It is okay to obtain a anon vma lock for a page that is only mapped via a swap pte to the page. This occurs frequently during page migration. The check for a mapped page (requiring regular ptes pointing to the page) gets in the way. Without this patch anonymous pages will have swap ptes after migration that then need to be converted into regular ptes via a page fault. Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6.16-rc4/mm/rmap.c =================================================================== --- linux-2.6.16-rc4.orig/mm/rmap.c 2006-02-17 14:23:45.000000000 -0800 +++ linux-2.6.16-rc4/mm/rmap.c 2006-02-24 13:19:11.000000000 -0800 @@ -196,8 +196,6 @@ static struct anon_vma *page_lock_anon_v anon_mapping = (unsigned long) page->mapping; if (!(anon_mapping & PAGE_MAPPING_ANON)) goto out; - if (!page_mapped(page)) - goto out; anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); spin_lock(&anon_vma->lock); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-25 1:03 page_lock_anon_vma(): remove check for mapped page Christoph Lameter @ 2006-02-25 14:01 ` Hugh Dickins 2006-02-26 4:42 ` Hugh Dickins 0 siblings, 1 reply; 16+ messages in thread From: Hugh Dickins @ 2006-02-25 14:01 UTC (permalink / raw) To: Christoph Lameter; +Cc: akpm, linux-mm On Fri, 24 Feb 2006, Christoph Lameter wrote: > Any reason that this function is checking for a mapped page? There could > be references through a swap pte to the page. The looping in > remove_from_swap, page_referenced_anon and try_to_unmap anon would > work even if the check for a mapped page would be removed. > > I have sent the patch below today to Hugh Dickins but did not receive an > answer. Probaby requires some discussion. Good question, and I was on the point of answering that it's just a racy micro-optimization that you could eliminate. But now I think that answer is wrong. It's actually an essential part of the tricky business of getting from the struct page to the anon_vma lock, when there's a danger that the anon_vma and even its slab may be recycled at any instant (remember that we have to leave the anon page->mapping set even after the last page_remove_rmap, with comment there on that). If the page is not found mapped under the rcu_read_lock, then there's no guarantee that the anon_vma memory hasn't already been freed and its slab page destroyed, and recycled for other purposes completely. I'll have to come back to this, and think it through more carefully: I might arrive at the opposite conclusion with more thought this evening. Hugh > > > > It is okay to obtain a anon vma lock for a page that is only mapped > via a swap pte to the page. This occurs frequently during page > migration. The check for a mapped page (requiring regular ptes pointing > to the page) gets in the way. > > Without this patch anonymous pages will have swap ptes after migration > that then need to be converted into regular ptes via a page fault. > > Signed-off-by: Christoph Lameter <clameter@sgi.com> > > Index: linux-2.6.16-rc4/mm/rmap.c > =================================================================== > --- linux-2.6.16-rc4.orig/mm/rmap.c 2006-02-17 14:23:45.000000000 -0800 > +++ linux-2.6.16-rc4/mm/rmap.c 2006-02-24 13:19:11.000000000 -0800 > @@ -196,8 +196,6 @@ static struct anon_vma *page_lock_anon_v > anon_mapping = (unsigned long) page->mapping; > if (!(anon_mapping & PAGE_MAPPING_ANON)) > goto out; > - if (!page_mapped(page)) > - goto out; > > anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); > spin_lock(&anon_vma->lock); > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-25 14:01 ` Hugh Dickins @ 2006-02-26 4:42 ` Hugh Dickins 2006-02-26 5:26 ` Christoph Lameter 2006-02-26 5:57 ` Christoph Lameter 0 siblings, 2 replies; 16+ messages in thread From: Hugh Dickins @ 2006-02-26 4:42 UTC (permalink / raw) To: Christoph Lameter; +Cc: akpm, linux-mm On Sat, 25 Feb 2006, Hugh Dickins wrote: > On Fri, 24 Feb 2006, Christoph Lameter wrote: > > > Any reason that this function is checking for a mapped page? There could > > be references through a swap pte to the page. The looping in > > remove_from_swap, page_referenced_anon and try_to_unmap anon would > > work even if the check for a mapped page would be removed. > > > > I have sent the patch below today to Hugh Dickins but did not receive an > > answer. Probaby requires some discussion. > > Good question, and I was on the point of answering that it's just a > racy micro-optimization that you could eliminate. But now I think > that answer is wrong. It's actually an essential part of the tricky > business of getting from the struct page to the anon_vma lock, when > there's a danger that the anon_vma and even its slab may be recycled > at any instant (remember that we have to leave the anon page->mapping > set even after the last page_remove_rmap, with comment there on that). > If the page is not found mapped under the rcu_read_lock, then there's > no guarantee that the anon_vma memory hasn't already been freed and > its slab page destroyed, and recycled for other purposes completely. > > I'll have to come back to this, and think it through more carefully: I > might arrive at the opposite conclusion with more thought this evening. I still believe the page_mapped test is essential for the correctness of the original page_referenced_anon and try_to_unmap_anon cases (but please don't ask me to reproduce the case it's guarding against!). But disastrous for the remove_from_swap case you've added - how does that work at all with the page_mapped test in? I'm not sure whether testing page_mapcount+page_swapcount (as used by can_share_swap_page) would help remove_from_swap; certainly it would be pointless and better avoided by page_referenced and try_to_unmap. But I think you can avoid it. It looks to me like the mmap_sem of an mm containing the pages is held across migrate_pages? That should be enough to guarantee that the anon_vmas involved cannot be freed behind your back (whereas page_referenced and try_to_unmap are called without any mmap_sem held). So you'd want to add a new flag to page_lock_anon_vma, to condition whether page_mapped is checked. Though I'm not yet certain that that won't have races of its own: please examine it sceptically. And is it actually guaranteed that a relevant mmap_sem is held here? Why on earth does vmscan.c contain EXPORT_SYMBOLs of migrate_page_remove_references, migrate_page_copy, migrate_page? Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-26 4:42 ` Hugh Dickins @ 2006-02-26 5:26 ` Christoph Lameter 2006-02-26 15:58 ` Hugh Dickins 2006-02-26 5:57 ` Christoph Lameter 1 sibling, 1 reply; 16+ messages in thread From: Christoph Lameter @ 2006-02-26 5:26 UTC (permalink / raw) To: Hugh Dickins; +Cc: akpm, linux-mm On Sun, 26 Feb 2006, Hugh Dickins wrote: > I still believe the page_mapped test is essential for the correctness > of the original page_referenced_anon and try_to_unmap_anon cases (but > please don't ask me to reproduce the case it's guarding against!). Well if it is essential then we should have some comment there explaining the purpose of that check. > But disastrous for the remove_from_swap case you've added - > how does that work at all with the page_mapped test in? It doesnt right now. Thats what I am trying to fix. It is not essential for page migration that the swap ptes be removed. If they are left then we need additional page faults to convert the swap ptes to regular one. So it currently slows things down and unnecessarily consume swap entries. > But I think you can avoid it. It looks to me like the mmap_sem of > an mm containing the pages is held across migrate_pages? That should Currently yes, but the hotplug folks may need to use the page migration functions without holding mmap_sem. So we would like to see the page migration code in vmscan.c not depend on mmap_sem. > be enough to guarantee that the anon_vmas involved cannot be freed The page is locked. Isnt that enough to guarantee that the page cannot be removed from the anon vma? > behind your back (whereas page_referenced and try_to_unmap are called > without any mmap_sem held). So you'd want to add a new flag to > page_lock_anon_vma, to condition whether page_mapped is checked. Is that really necessary? Can we check for page mapped or page locked? > Though I'm not yet certain that that won't have races of its own: > please examine it sceptically. And is it actually guaranteed that > a relevant mmap_sem is held here? Why on earth does vmscan.c contain > EXPORT_SYMBOLs of migrate_page_remove_references, migrate_page_copy, > migrate_page? This is so that filesystems can generate their own migration functions. Filesystem may mantain structures with additional references to the pages being moved and we cannot move pages with buffers without filesystem cooperation. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-26 5:26 ` Christoph Lameter @ 2006-02-26 15:58 ` Hugh Dickins 2006-02-27 15:47 ` Christoph Lameter 0 siblings, 1 reply; 16+ messages in thread From: Hugh Dickins @ 2006-02-26 15:58 UTC (permalink / raw) To: Christoph Lameter; +Cc: akpm, linux-mm On Sat, 25 Feb 2006, Christoph Lameter wrote: > On Sun, 26 Feb 2006, Hugh Dickins wrote: > > > But I think you can avoid it. It looks to me like the mmap_sem of > > an mm containing the pages is held across migrate_pages? That should > > Currently yes, but the hotplug folks may need to use the page > migration functions without holding mmap_sem. So we would like to see the > page migration code in vmscan.c not depend on mmap_sem. I'm afraid you'll have to think up some other way to stabilize the anon_vma in that future extension then. For now mmap_sem covers it. Wish I could dream up a BUG_ON to warn against those future changes. > > be enough to guarantee that the anon_vmas involved cannot be freed > > The page is locked. Isnt that enough to guarantee that the page cannot be > removed from the anon vma? Not at all. We don't have to take page lock when munmapping or exiting mm (though when it's a swap page, we do trylock). Nor would wish to. > > behind your back (whereas page_referenced and try_to_unmap are called > > without any mmap_sem held). So you'd want to add a new flag to > > page_lock_anon_vma, to condition whether page_mapped is checked. > > Is that really necessary? Can we check for page mapped or page locked? Yes. No. > > Though I'm not yet certain that that won't have races of its own: > > please examine it sceptically. And is it actually guaranteed that > > a relevant mmap_sem is held here? Why on earth does vmscan.c contain > > EXPORT_SYMBOLs of migrate_page_remove_references, migrate_page_copy, > > migrate_page? > > This is so that filesystems can generate their own migration functions. > Filesystem may mantain structures with additional references to the pages > being moved and we cannot move pages with buffers without filesystem > cooperation. Hmm. I'd be happier about them if there were some example in the tree of how they should be used from a filesystem: kill the EXPORTs until then? probably too late now, to make that change in 2.6.16. So long as the filesystem only tries to migrate its own pagecache pages, it should be okay (the locking for file pages is not problematic as it is for anonymous - probably because we do insist on locking pages before truncating or clearing the cache). But if it were to try to migrate the anonymous pages COWed into a private file-based vma, without any mmap_sem, then it would be unsafe. Unlikely mistake. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-26 15:58 ` Hugh Dickins @ 2006-02-27 15:47 ` Christoph Lameter 0 siblings, 0 replies; 16+ messages in thread From: Christoph Lameter @ 2006-02-27 15:47 UTC (permalink / raw) To: Hugh Dickins; +Cc: akpm, linux-mm On Sun, 26 Feb 2006, Hugh Dickins wrote: > > This is so that filesystems can generate their own migration functions. > > Filesystem may mantain structures with additional references to the pages > > being moved and we cannot move pages with buffers without filesystem > > cooperation. > > Hmm. I'd be happier about them if there were some example in the tree > of how they should be used from a filesystem: kill the EXPORTs until > then? probably too late now, to make that change in 2.6.16. There is an example in fs/buffer.c and xfs/ext2/ext3 use that sample. > So long as the filesystem only tries to migrate its own pagecache > pages, it should be okay (the locking for file pages is not problematic > as it is for anonymous - probably because we do insist on locking pages > before truncating or clearing the cache). But if it were to try to > migrate the anonymous pages COWed into a private file-based vma, > without any mmap_sem, then it would be unsafe. Unlikely mistake. pagecache pages are not in an anonymous vma and. pte information can be reconsituted from the mappings. So the page migration code currently simply removes the ptes for those. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-26 4:42 ` Hugh Dickins 2006-02-26 5:26 ` Christoph Lameter @ 2006-02-26 5:57 ` Christoph Lameter 2006-02-26 16:07 ` Hugh Dickins 1 sibling, 1 reply; 16+ messages in thread From: Christoph Lameter @ 2006-02-26 5:57 UTC (permalink / raw) To: Hugh Dickins; +Cc: akpm, linux-mm Here is the parameterization you wanted. However, I am still not sure that a check for a valid mapping here is sufficient if the caller has no other means to guarantee that the mapping is not vanishing. If the mapping is removed after the check for the mapping was done then we still have a problem. Or is there some way that RCU can preserve the existence of an anonymous vma? Cannot imagine how that would work. If an rcu free was done on the anonymous vma then it may vanish anytime after page_lock_anon_vma does a rcu unlock. And then we are holding a lock that is located in free space...... page_lock_anon_vma: Add additional parameter to control mapped check It is okay to obtain a anon vma lock for a page that is only mapped via a swap pte to the page. This occurs frequently during page migration. The check for a mapped page (requiring regular ptes pointing to the page) gets in the way. Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6.16-rc4/mm/rmap.c =================================================================== --- linux-2.6.16-rc4.orig/mm/rmap.c 2006-02-17 14:23:45.000000000 -0800 +++ linux-2.6.16-rc4/mm/rmap.c 2006-02-25 21:51:49.000000000 -0800 @@ -187,7 +187,7 @@ void __init anon_vma_init(void) * Getting a lock on a stable anon_vma from a page off the LRU is * tricky: page_lock_anon_vma rely on RCU to guard against the races. */ -static struct anon_vma *page_lock_anon_vma(struct page *page) +static struct anon_vma *page_lock_anon_vma(struct page *page, int check_mapped) { struct anon_vma *anon_vma = NULL; unsigned long anon_mapping; @@ -196,7 +196,15 @@ static struct anon_vma *page_lock_anon_v anon_mapping = (unsigned long) page->mapping; if (!(anon_mapping & PAGE_MAPPING_ANON)) goto out; - if (!page_mapped(page)) + /* + * Mysterious check that may have something to do with the vma + * potentially vanishing if page was the last page in the mapping + * and was just removed. + * + * Check is not necessary if we have another means of guaranteeing + * that the vma is safe. + */ + if (check_mapped && !page_mapped(page)) goto out; anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); @@ -222,7 +230,7 @@ void remove_from_swap(struct page *page) if (!PageAnon(page) || !PageSwapCache(page)) return; - anon_vma = page_lock_anon_vma(page); + anon_vma = page_lock_anon_vma(page, 0); if (!anon_vma) return; @@ -359,7 +367,7 @@ static int page_referenced_anon(struct p struct vm_area_struct *vma; int referenced = 0; - anon_vma = page_lock_anon_vma(page); + anon_vma = page_lock_anon_vma(page, 1); if (!anon_vma) return referenced; @@ -737,7 +745,7 @@ static int try_to_unmap_anon(struct page struct vm_area_struct *vma; int ret = SWAP_AGAIN; - anon_vma = page_lock_anon_vma(page); + anon_vma = page_lock_anon_vma(page, 1); if (!anon_vma) return ret; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-26 5:57 ` Christoph Lameter @ 2006-02-26 16:07 ` Hugh Dickins 2006-02-27 15:55 ` Christoph Lameter 0 siblings, 1 reply; 16+ messages in thread From: Hugh Dickins @ 2006-02-26 16:07 UTC (permalink / raw) To: Christoph Lameter; +Cc: akpm, linux-mm On Sat, 25 Feb 2006, Christoph Lameter wrote: > Here is the parameterization you wanted. However, I am still not sure > that a check for a valid mapping here is sufficient if the caller has no > other means to guarantee that the mapping is not vanishing. > > If the mapping is removed after the check for the mapping was done then > we still have a problem. > > Or is there some way that RCU can preserve the existence of an anonymous > vma? > > Cannot imagine how that would work. If an rcu free was done on the > anonymous vma then it may vanish anytime after page_lock_anon_vma does a > rcu unlock. And then we are holding a lock that is located in free > space...... Please see comments on SLAB_DESTROY_BY_RCU in mm/slab.c: that's why the anon_vma cache is created with that flag, that's why page_lock_anon_vma uses rcu_read_lock. Your patch, with more appropriate comments and my signoff added, below (but, in case there's any doubt, it's not suitable for 2.6.16 - the change itself is simple, but it suddenly makes the hitherto untried codepaths of remove_from_swap accessible). Hugh page_lock_anon_vma: Add additional parameter to control mapped check It is okay for page migration's remove_from_swap to obtain anon_vma lock for a page which is currently unmapped, because in its case an mmap_sem is held, which protects the struct anon_vma from being freed. The check for a mapped page prevented remove_from_swap from working until now. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> --- mm/rmap.c | 28 +++++++++++++++++++++++----- 1 files changed, 23 insertions(+), 5 deletions(-) --- 2.6.16-rc4-git9/mm/rmap.c 2006-02-18 12:28:18.000000000 +0000 +++ linux/mm/rmap.c 2006-02-26 15:27:22.000000000 +0000 @@ -186,8 +186,10 @@ void __init anon_vma_init(void) /* * Getting a lock on a stable anon_vma from a page off the LRU is * tricky: page_lock_anon_vma rely on RCU to guard against the races. + * It is for this reason that the anon_vma cache is created above with + * the SLAB_DESTROY_BY_RCU flag: see comment on that flag in mm/slab.c. */ -static struct anon_vma *page_lock_anon_vma(struct page *page) +static struct anon_vma *page_lock_anon_vma(struct page *page, int check_mapped) { struct anon_vma *anon_vma = NULL; unsigned long anon_mapping; @@ -196,7 +198,23 @@ static struct anon_vma *page_lock_anon_v anon_mapping = (unsigned long) page->mapping; if (!(anon_mapping & PAGE_MAPPING_ANON)) goto out; - if (!page_mapped(page)) + /* + * When called from page_referenced_anon or try_to_unmap_anon, + * we have no hold on the struct anon_vma: it might already have + * been freed, or be on its way to being freed. The check below + * on page_mapped ensures that it has not yet been freed, though + * it might still be freed before taking the anon_vma->lock; but + * because we are under rcu_read_lock, and the anon_vma cache is + * marked SLAB_DESTROY_BY_RCU, anon_vma->lock remains safe for + * locking here (and the rest of the structure no worse than + * irrelevant), even if this anon_vma struct has been reused. + * + * But page migration's remove_from_swap needs to take this lock + * when the page has been unmapped, and so must skip around that + * page_mapped check. In this case, the anon_vma is stabilized + * by the down_read(&mm->mmap_sem) in do_migrate_pages. + */ + if (check_mapped && !page_mapped(page)) goto out; anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); @@ -222,7 +240,7 @@ void remove_from_swap(struct page *page) if (!PageAnon(page) || !PageSwapCache(page)) return; - anon_vma = page_lock_anon_vma(page); + anon_vma = page_lock_anon_vma(page, 0); if (!anon_vma) return; @@ -359,7 +377,7 @@ static int page_referenced_anon(struct p struct vm_area_struct *vma; int referenced = 0; - anon_vma = page_lock_anon_vma(page); + anon_vma = page_lock_anon_vma(page, 1); if (!anon_vma) return referenced; @@ -737,7 +755,7 @@ static int try_to_unmap_anon(struct page struct vm_area_struct *vma; int ret = SWAP_AGAIN; - anon_vma = page_lock_anon_vma(page); + anon_vma = page_lock_anon_vma(page, 1); if (!anon_vma) return ret; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-26 16:07 ` Hugh Dickins @ 2006-02-27 15:55 ` Christoph Lameter 2006-02-27 16:32 ` Hugh Dickins 0 siblings, 1 reply; 16+ messages in thread From: Christoph Lameter @ 2006-02-27 15:55 UTC (permalink / raw) To: Hugh Dickins; +Cc: akpm, linux-mm On Sun, 26 Feb 2006, Hugh Dickins wrote: > On Sat, 25 Feb 2006, Christoph Lameter wrote: > > Here is the parameterization you wanted. However, I am still not sure > > that a check for a valid mapping here is sufficient if the caller has no > > other means to guarantee that the mapping is not vanishing. > > > > If the mapping is removed after the check for the mapping was done then > > we still have a problem. > > > > Or is there some way that RCU can preserve the existence of an anonymous > > vma? > > > > Cannot imagine how that would work. If an rcu free was done on the > > anonymous vma then it may vanish anytime after page_lock_anon_vma does a > > rcu unlock. And then we are holding a lock that is located in free > > space...... > > Please see comments on SLAB_DESTROY_BY_RCU in mm/slab.c: that's why the > anon_vma cache is created with that flag, that's why page_lock_anon_vma > uses rcu_read_lock. Your patch, with more appropriate comments and my > signoff added, below (but, in case there's any doubt, it's not suitable > for 2.6.16 - the change itself is simple, but it suddenly makes the > hitherto untried codepaths of remove_from_swap accessible). At least my tests show that this codepath is valid and its for new functionality in 2.6.16. So I guess its suitable for 2.6.16. I doubt that RCU can help if the anon_vma is removed after the check for page_mapped. In that case RCU prevents the ultimate free from happening until rcu_read_unlock. So in essence we lock the anon_vma, return a pointer to the anonymous vma and then free the anon_vma? Functions calling page_lock_anon_vma may operate on freed memory due to this race. Should we not check again for page_mapped after taking the lock? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-27 15:55 ` Christoph Lameter @ 2006-02-27 16:32 ` Hugh Dickins 2006-02-27 16:43 ` Christoph Lameter 0 siblings, 1 reply; 16+ messages in thread From: Hugh Dickins @ 2006-02-27 16:32 UTC (permalink / raw) To: Christoph Lameter; +Cc: akpm, linux-mm Thanks for pointing me to the fs use of buffer_migrate_page in other mail. On Mon, 27 Feb 2006, Christoph Lameter wrote: > On Sun, 26 Feb 2006, Hugh Dickins wrote: > > > > Please see comments on SLAB_DESTROY_BY_RCU in mm/slab.c: that's why the > > anon_vma cache is created with that flag, that's why page_lock_anon_vma > > uses rcu_read_lock. Your patch, with more appropriate comments and my > > signoff added, below (but, in case there's any doubt, it's not suitable > > for 2.6.16 - the change itself is simple, but it suddenly makes the > > hitherto untried codepaths of remove_from_swap accessible). > > At least my tests show that this codepath is valid and its for new > functionality in 2.6.16. So I guess its suitable for 2.6.16. Well, it's certainly not for me to decide: I just didn't want my signoff to be interpreted as a request to push it into 2.6.16. It seemed to me rather late to be enabling this new functionality in 2.6.16, even though it's a bug that it wasn't already enabled in 2.6.16-rc: you'll have to argue that one without me. Perhaps it doesn't matter if the vast majority have CONFIG_MIGRATION configured off. > I doubt that RCU can help if the anon_vma is removed after the check for > page_mapped. In that case RCU prevents the ultimate free from happening > until rcu_read_unlock. So in essence we lock the anon_vma, return a > pointer to the anonymous vma and then free the anon_vma? Functions calling > page_lock_anon_vma may operate on freed memory due to this race. I'm not sure that I've understood your doubt correctly. But I think you're missing that rcu_read_lock is just another name for preempt_disable, plus we always disable preemption when taking a spin lock: so in effect we have rcu_read_lock in force until the spin_unlock(&anon_vma->lock). > Should we not check again for page_mapped after taking the lock? There's no need to. But it certainly requires that the functions we go on to call (page_referenced_one, try_to_unmap_one, remove_vma_swap) act in the way they do, checking ptes for a match with page or swap entry: would be dangerous if they ever assumed a match without looking. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-27 16:32 ` Hugh Dickins @ 2006-02-27 16:43 ` Christoph Lameter 2006-02-27 17:23 ` Hugh Dickins 0 siblings, 1 reply; 16+ messages in thread From: Christoph Lameter @ 2006-02-27 16:43 UTC (permalink / raw) To: Hugh Dickins; +Cc: akpm, linux-mm On Mon, 27 Feb 2006, Hugh Dickins wrote: > > At least my tests show that this codepath is valid and its for new > > functionality in 2.6.16. So I guess its suitable for 2.6.16. > > Well, it's certainly not for me to decide: I just didn't want my signoff > to be interpreted as a request to push it into 2.6.16. It seemed to me > rather late to be enabling this new functionality in 2.6.16, even though > it's a bug that it wasn't already enabled in 2.6.16-rc: you'll have to > argue that one without me. Perhaps it doesn't matter if the vast > majority have CONFIG_MIGRATION configured off. There are only a very few users of page migration since this is new functionality. Even those with CONFIG_MIGRATION need to do special modifications to their systems (installiing a new numactl package, setting up their cpusets differently) to use this functionality. Also the functionality in question has been available in the hotplug project in the past. > I'm not sure that I've understood your doubt correctly. But I think > you're missing that rcu_read_lock is just another name for preempt_disable, > plus we always disable preemption when taking a spin lock: so in effect > we have rcu_read_lock in force until the spin_unlock(&anon_vma->lock). That is a rather subtle thing not evident from the code. Add another comment? Or better do the rcu locking before calling page_lock_anon_vma and the unlocking after spin_unlock to have proper nesting of locks? We have a rather confusing rcu_read_unlock in page_lock_anon_vma.... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-27 16:43 ` Christoph Lameter @ 2006-02-27 17:23 ` Hugh Dickins 2006-02-27 18:11 ` Christoph Lameter 0 siblings, 1 reply; 16+ messages in thread From: Hugh Dickins @ 2006-02-27 17:23 UTC (permalink / raw) To: Christoph Lameter; +Cc: akpm, linux-mm On Mon, 27 Feb 2006, Christoph Lameter wrote: > > That is a rather subtle thing not evident from the code. Add another > comment? I won't fight if you insist on doing so, but it's already proclaimed itself to be tricky, with lengthy comments in mm/slab.c and now here. At some stage, I think, we need to stop reading comments and ponder the code itself. > Or better do the rcu locking before calling page_lock_anon_vma > and the unlocking after spin_unlock to have proper nesting of locks? No, page_lock_anon_vma is all about insulating the rest of the code from these difficulties: I do prefer it as is. That said, I had mixed feelings when the name "rcu_read_lock" was introduced: it's not always helpful to distinguish it from preempt_disable in that way. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-27 17:23 ` Hugh Dickins @ 2006-02-27 18:11 ` Christoph Lameter 2006-02-27 18:27 ` Hugh Dickins 0 siblings, 1 reply; 16+ messages in thread From: Christoph Lameter @ 2006-02-27 18:11 UTC (permalink / raw) To: Hugh Dickins; +Cc: akpm, linux-mm On Mon, 27 Feb 2006, Hugh Dickins wrote: > > Or better do the rcu locking before calling page_lock_anon_vma > > and the unlocking after spin_unlock to have proper nesting of locks? > > No, page_lock_anon_vma is all about insulating the rest of the code > from these difficulties: I do prefer it as is. Hmm... How about page_lock_anon_vma and page_unlock_anon_vma? This > That said, I had mixed feelings when the name "rcu_read_lock" was > introduced: it's not always helpful to distinguish it from > preempt_disable in that way. Hmm.. It seems that the rcu implementation has been fluctuating somewhat in the past. I fear that code reviewers will not realize that the freeing of the anon_vma is in fact delayed much longer than a superficial review of the page_lock_anon_vma reveals. How about this patch: page_unlock_anon_vma: cleanup locking and comments page_unlock_anon_vma calls rcu_read_unlock() after a spinlock has been obtained to delay rcu freeing past rcu_read_unlock(). Make this rcu behavior evident by moving the rcu_read_unlock() after the spin_unlock() and add some comments explaining why and how locking works in page_unlock_anon_vma(). Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6.16-rc5/mm/rmap.c =================================================================== --- linux-2.6.16-rc5.orig/mm/rmap.c 2006-02-27 10:04:15.000000000 -0800 +++ linux-2.6.16-rc5/mm/rmap.c 2006-02-27 10:10:25.000000000 -0800 @@ -196,16 +196,36 @@ static struct anon_vma *page_lock_anon_v anon_mapping = (unsigned long) page->mapping; if (!(anon_mapping & PAGE_MAPPING_ANON)) goto out; + + /* + * We do not remove the mapping when we unmap the vmas and the + * anon_vma that may contain this page. + * page->mapping is only set to NULL when the page is finally + * returned to the free pool. + * + * Thus we can only be sure that the mapping is valid if the page + * is still mapped by a process. Should the page become unmapped + * after the check below then rcu locking will preserve the anon_vma + * structure until page_unlock_anon_vma() is called. + */ if (!page_mapped(page)) goto out; anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); spin_lock(&anon_vma->lock); + return anon_vma; + out: rcu_read_unlock(); return anon_vma; } +static void page_unlock_anon_vma(struct anon_vma *anon_vma) +{ + spin_unlock(&anon_vma); + rcu_read_unlock(); +} + #ifdef CONFIG_MIGRATION /* * Remove an anonymous page from swap replacing the swap pte's @@ -369,7 +389,7 @@ static int page_referenced_anon(struct p if (!mapcount) break; } - spin_unlock(&anon_vma->lock); + page_unlock_anon_vma(anon_vma); return referenced; } @@ -746,7 +766,7 @@ static int try_to_unmap_anon(struct page if (ret == SWAP_FAIL || !page_mapped(page)) break; } - spin_unlock(&anon_vma->lock); + page_unlock_anon_vma(anon_vma); return ret; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-27 18:11 ` Christoph Lameter @ 2006-02-27 18:27 ` Hugh Dickins 2006-02-27 18:31 ` Christoph Lameter 0 siblings, 1 reply; 16+ messages in thread From: Hugh Dickins @ 2006-02-27 18:27 UTC (permalink / raw) To: Christoph Lameter; +Cc: akpm, linux-mm On Mon, 27 Feb 2006, Christoph Lameter wrote: > On Mon, 27 Feb 2006, Hugh Dickins wrote: > > > > Or better do the rcu locking before calling page_lock_anon_vma > > > and the unlocking after spin_unlock to have proper nesting of locks? > > > > No, page_lock_anon_vma is all about insulating the rest of the code > > from these difficulties: I do prefer it as is. > > Hmm... How about page_lock_anon_vma and page_unlock_anon_vma? This > > I fear that code reviewers will not realize that the freeing of the > anon_vma is in fact delayed much longer than a superficial review of the > page_lock_anon_vma reveals. > > How about this patch: I'd prefer not myself, perhaps someone else likes it. And you haven't even based it on the check_mapped version you sent me, which I then returned to you with more helpful comments. Let's just move on? Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-27 18:27 ` Hugh Dickins @ 2006-02-27 18:31 ` Christoph Lameter 2006-02-27 18:43 ` Hugh Dickins 0 siblings, 1 reply; 16+ messages in thread From: Christoph Lameter @ 2006-02-27 18:31 UTC (permalink / raw) To: Hugh Dickins; +Cc: akpm, linux-mm On Mon, 27 Feb 2006, Hugh Dickins wrote: > > How about this patch: > > I'd prefer not myself, perhaps someone else likes it. > And you haven't even based it on the check_mapped version you sent me, > which I then returned to you with more helpful comments. Well, I thought overall cleanness issue was better dealt with first before we go into that. The check_mapped can be done in a different way since there is no need for an rcu lock because we now have the requirement to hold mmap_sem for protection. If you do not like it then I am fine with your patch: remove_from_swap: Fix locking remove_from_swap currently attempt to use page_lock_anon_vma to obtain an anon_vma lock. That is not working since the page may have been remapped via swap ptes in order to move the page. However, do_migrate_pages() obtain the mmap_sem lock and therefore there is a guarantee that the anonymous vma will not vanish from under us. There is therefore no need to use page_lock_anon_vma. Signed-off-by: Christoph Lameter <clameter@sgi.com> Index: linux-2.6.16-rc5/mm/rmap.c =================================================================== --- linux-2.6.16-rc5.orig/mm/rmap.c 2006-02-27 10:10:25.000000000 -0800 +++ linux-2.6.16-rc5/mm/rmap.c 2006-02-27 10:29:20.000000000 -0800 @@ -232,25 +232,33 @@ static void page_unlock_anon_vma(struct * through real pte's pointing to valid pages and then releasing * the page from the swap cache. * - * Must hold page lock on page. + * Must hold page lock on page and mmap_sem of one vma that contains + * the page. */ void remove_from_swap(struct page *page) { struct anon_vma *anon_vma; struct vm_area_struct *vma; + unsigned long mapping; - if (!PageAnon(page) || !PageSwapCache(page)) + if (!PageSwapCache(page)) return; - anon_vma = page_lock_anon_vma(page); - if (!anon_vma) + mapping = (unsigned long)page->mapping; + + if (!mapping || (mapping & PAGE_MAPPING_ANON) == 0) return; + /* + * We hold the mmap_sem lock. So no need to call page_lock_anon_vma. + */ + anon_vma = (struct anon_vma *) (page->mapping - PAGE_MAPPING_ANON); + spin_lock(&anon_vma->lock); + list_for_each_entry(vma, &anon_vma->head, anon_vma_node) remove_vma_swap(vma, page); spin_unlock(&anon_vma->lock); - delete_from_swap_cache(page); } EXPORT_SYMBOL(remove_from_swap); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: page_lock_anon_vma(): remove check for mapped page 2006-02-27 18:31 ` Christoph Lameter @ 2006-02-27 18:43 ` Hugh Dickins 0 siblings, 0 replies; 16+ messages in thread From: Hugh Dickins @ 2006-02-27 18:43 UTC (permalink / raw) To: Christoph Lameter; +Cc: akpm, linux-mm On Mon, 27 Feb 2006, Christoph Lameter wrote: > > The check_mapped can be done in a different way since there is no need > for an rcu lock because we now have the requirement to hold > mmap_sem for protection. You're right, I hadn't thought of it that way round, you're better off simply avoiding page_lock_anon_vma for your usage. Acked-by: Hugh Dickins <hugh@veritas.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2006-02-27 18:43 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-02-25 1:03 page_lock_anon_vma(): remove check for mapped page Christoph Lameter 2006-02-25 14:01 ` Hugh Dickins 2006-02-26 4:42 ` Hugh Dickins 2006-02-26 5:26 ` Christoph Lameter 2006-02-26 15:58 ` Hugh Dickins 2006-02-27 15:47 ` Christoph Lameter 2006-02-26 5:57 ` Christoph Lameter 2006-02-26 16:07 ` Hugh Dickins 2006-02-27 15:55 ` Christoph Lameter 2006-02-27 16:32 ` Hugh Dickins 2006-02-27 16:43 ` Christoph Lameter 2006-02-27 17:23 ` Hugh Dickins 2006-02-27 18:11 ` Christoph Lameter 2006-02-27 18:27 ` Hugh Dickins 2006-02-27 18:31 ` Christoph Lameter 2006-02-27 18:43 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox