* [PATCH] fix mlocked page counter mismatch
@ 2009-02-02 6:16 MinChan Kim
2009-02-02 17:16 ` Lee Schermerhorn
0 siblings, 1 reply; 5+ messages in thread
From: MinChan Kim @ 2009-02-02 6:16 UTC (permalink / raw)
To: linux mm, Andrew Morton
Cc: linux kernel, Lee Schermerhorn, Nick Piggin, KOSAKI Motohiro
When I tested following program, I found that mlocked counter
is strange.
It couldn't free some mlocked pages of test program.
It is caused that try_to_unmap_file don't check real
page mapping in vmas.
That's because goal of address_space for file is to find all processes
into which the file's specific interval is mapped.
What I mean is that it's not related page but file's interval.
Even if the page isn't really mapping at the vma, it returns
SWAP_MLOCK since the vma have VM_LOCKED, then calls
try_to_mlock_page. After all, mlocked counter is increased again.
This patch is based on 2.6.28-rc2-mm1.
-- my test program --
#include <stdio.h>
#include <sys/mman.h>
int main()
{
mlockall(MCL_CURRENT);
return 0;
}
-- before --
root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
Unevictable: 0 kB
Mlocked: 0 kB
-- after --
root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
Unevictable: 8 kB
Mlocked: 8 kB
--
diff --git a/mm/rmap.c b/mm/rmap.c
index 1099394..9ba1fdf 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1073,6 +1073,9 @@ static int try_to_unmap_file(struct page *page, int unlock, int migration)
unsigned long max_nl_size = 0;
unsigned int mapcount;
unsigned int mlocked = 0;
+ unsigned long address;
+ pte_t *pte;
+ spinlock_t *ptl;
if (MLOCK_PAGES && unlikely(unlock))
ret = SWAP_SUCCESS; /* default for try_to_munlock() */
@@ -1089,6 +1092,13 @@ static int try_to_unmap_file(struct page *page, int unlock, int migration)
goto out;
}
if (ret == SWAP_MLOCK) {
+ address = vma_address(page, vma);
+ if (address != -EFAULT) {
+ pte = page_check_address(page, vma->vm_mm, address, &ptl, 0);
+ if (!pte)
+ continue;
+ pte_unmap_unlock(pte, ptl);
+ }
mlocked = try_to_mlock_page(page, vma);
if (mlocked)
break; /* stop if actually mlocked page */
--
Kinds Regards
MinChan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] fix mlocked page counter mismatch 2009-02-02 6:16 [PATCH] fix mlocked page counter mismatch MinChan Kim @ 2009-02-02 17:16 ` Lee Schermerhorn 2009-02-02 23:27 ` MinChan Kim 0 siblings, 1 reply; 5+ messages in thread From: Lee Schermerhorn @ 2009-02-02 17:16 UTC (permalink / raw) To: MinChan Kim Cc: linux mm, Andrew Morton, linux kernel, Nick Piggin, KOSAKI Motohiro On Mon, 2009-02-02 at 15:16 +0900, MinChan Kim wrote: > When I tested following program, I found that mlocked counter > is strange. > It couldn't free some mlocked pages of test program. > > It is caused that try_to_unmap_file don't check real > page mapping in vmas. > That's because goal of address_space for file is to find all processes > into which the file's specific interval is mapped. > What I mean is that it's not related page but file's interval. > > Even if the page isn't really mapping at the vma, it returns > SWAP_MLOCK since the vma have VM_LOCKED, then calls > try_to_mlock_page. After all, mlocked counter is increased again. > > This patch is based on 2.6.28-rc2-mm1. > > -- my test program -- > > #include <stdio.h> > #include <sys/mman.h> > int main() > { > mlockall(MCL_CURRENT); > return 0; > } > > -- before -- > > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' > Unevictable: 0 kB > Mlocked: 0 kB > > -- after -- > > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' > Unevictable: 8 kB > Mlocked: 8 kB > > > -- > > diff --git a/mm/rmap.c b/mm/rmap.c > index 1099394..9ba1fdf 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1073,6 +1073,9 @@ static int try_to_unmap_file(struct page *page, int unlock, int migration) > unsigned long max_nl_size = 0; > unsigned int mapcount; > unsigned int mlocked = 0; > + unsigned long address; > + pte_t *pte; > + spinlock_t *ptl; > > if (MLOCK_PAGES && unlikely(unlock)) > ret = SWAP_SUCCESS; /* default for try_to_munlock() */ > @@ -1089,6 +1092,13 @@ static int try_to_unmap_file(struct page *page, int unlock, int migration) > goto out; > } > if (ret == SWAP_MLOCK) { > + address = vma_address(page, vma); > + if (address != -EFAULT) { > + pte = page_check_address(page, vma->vm_mm, address, &ptl, 0); > + if (!pte) > + continue; > + pte_unmap_unlock(pte, ptl); > + } > mlocked = try_to_mlock_page(page, vma); > if (mlocked) > break; /* stop if actually mlocked page */ Hi, MinChan: Interestingly, Rik had addressed this [simpler patch below] way back when he added the page_mapped_in_vma() function. I asked him whether the rb tree shouldn't have filtered any vmas that didn't have the page mapped. He agreed and removed the check from try_to_unmap_file(). Guess I can be very convincing, even when I'm wrong [happening a lot lately]. Of course, in this instance, the rb-tree filtering only works for shared, page-cache pages. The problem uncovered by your test case is with a COWed anon page in a file-backed vma. Yes, the vma 'maps' the virtual address range containing the page in question, but since it's a private COWed anon page, it isn't necessarily "mapped" in the VM_LOCKED vma's mm's page table. We need the check... I've added the variant below [CURRENTLY UNTESTED] to my test tree. Lee [intentionally omitted sign off, until tested.] Index: linux-2.6.29-rc3/mm/rmap.c =================================================================== --- linux-2.6.29-rc3.orig/mm/rmap.c 2009-01-30 14:13:56.000000000 -0500 +++ linux-2.6.29-rc3/mm/rmap.c 2009-02-02 11:27:11.000000000 -0500 @@ -1072,7 +1072,8 @@ static int try_to_unmap_file(struct page spin_lock(&mapping->i_mmap_lock); vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { if (MLOCK_PAGES && unlikely(unlock)) { - if (!(vma->vm_flags & VM_LOCKED)) + if (!((vma->vm_flags & VM_LOCKED) && + page_mapped_in_vma(page, vma))) continue; /* must visit all vmas */ ret = SWAP_MLOCK; } else { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fix mlocked page counter mismatch 2009-02-02 17:16 ` Lee Schermerhorn @ 2009-02-02 23:27 ` MinChan Kim 2009-02-03 1:48 ` Lee Schermerhorn 0 siblings, 1 reply; 5+ messages in thread From: MinChan Kim @ 2009-02-02 23:27 UTC (permalink / raw) To: Lee Schermerhorn Cc: linux mm, Andrew Morton, linux kernel, Nick Piggin, KOSAKI Motohiro On Mon, Feb 02, 2009 at 12:16:35PM -0500, Lee Schermerhorn wrote: > On Mon, 2009-02-02 at 15:16 +0900, MinChan Kim wrote: > > When I tested following program, I found that mlocked counter > > is strange. > > It couldn't free some mlocked pages of test program. > > > > It is caused that try_to_unmap_file don't check real > > page mapping in vmas. > > That's because goal of address_space for file is to find all processes > > into which the file's specific interval is mapped. > > What I mean is that it's not related page but file's interval. > > > > Even if the page isn't really mapping at the vma, it returns > > SWAP_MLOCK since the vma have VM_LOCKED, then calls > > try_to_mlock_page. After all, mlocked counter is increased again. > > > > This patch is based on 2.6.28-rc2-mm1. > > > > -- my test program -- > > > > #include <stdio.h> > > #include <sys/mman.h> > > int main() > > { > > mlockall(MCL_CURRENT); > > return 0; > > } > > > > -- before -- > > > > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' > > Unevictable: 0 kB > > Mlocked: 0 kB > > > > -- after -- > > > > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' > > Unevictable: 8 kB > > Mlocked: 8 kB > > > > > > -- > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > index 1099394..9ba1fdf 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -1073,6 +1073,9 @@ static int try_to_unmap_file(struct page *page, int unlock, int migration) > > unsigned long max_nl_size = 0; > > unsigned int mapcount; > > unsigned int mlocked = 0; > > + unsigned long address; > > + pte_t *pte; > > + spinlock_t *ptl; > > > > if (MLOCK_PAGES && unlikely(unlock)) > > ret = SWAP_SUCCESS; /* default for try_to_munlock() */ > > @@ -1089,6 +1092,13 @@ static int try_to_unmap_file(struct page *page, int unlock, int migration) > > goto out; > > } > > if (ret == SWAP_MLOCK) { > > + address = vma_address(page, vma); > > + if (address != -EFAULT) { > > + pte = page_check_address(page, vma->vm_mm, address, &ptl, 0); > > + if (!pte) > > + continue; > > + pte_unmap_unlock(pte, ptl); > > + } > > mlocked = try_to_mlock_page(page, vma); > > if (mlocked) > > break; /* stop if actually mlocked page */ > > Hi, MinChan: > > Interestingly, Rik had addressed this [simpler patch below] way back > when he added the page_mapped_in_vma() function. I asked him whether > the rb tree shouldn't have filtered any vmas that didn't have the page > mapped. He agreed and removed the check from try_to_unmap_file(). > Guess I can be very convincing, even when I'm wrong [happening a lot > lately]. Of course, in this instance, the rb-tree filtering only works > for shared, page-cache pages. The problem uncovered by your test case It's not rb-tree but priority tree. ;-) > is with a COWed anon page in a file-backed vma. Yes, the vma 'maps' the > virtual address range containing the page in question, but since it's a > private COWed anon page, it isn't necessarily "mapped" in the VM_LOCKED > vma's mm's page table. We need the check... Indeed! > > I've added the variant below [CURRENTLY UNTESTED] to my test tree. > > Lee > > [intentionally omitted sign off, until tested.] I shouldn't forgot page_mapped_in_vma. However, It looks good to me. Thank you for testing. > > > Index: linux-2.6.29-rc3/mm/rmap.c > =================================================================== > --- linux-2.6.29-rc3.orig/mm/rmap.c 2009-01-30 14:13:56.000000000 -0500 > +++ linux-2.6.29-rc3/mm/rmap.c 2009-02-02 11:27:11.000000000 -0500 > @@ -1072,7 +1072,8 @@ static int try_to_unmap_file(struct page > spin_lock(&mapping->i_mmap_lock); > vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { > if (MLOCK_PAGES && unlikely(unlock)) { > - if (!(vma->vm_flags & VM_LOCKED)) > + if (!((vma->vm_flags & VM_LOCKED) && > + page_mapped_in_vma(page, vma))) > continue; /* must visit all vmas */ > ret = SWAP_MLOCK; > } else { > -- Kinds Regards MinChan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fix mlocked page counter mismatch 2009-02-02 23:27 ` MinChan Kim @ 2009-02-03 1:48 ` Lee Schermerhorn 2009-02-03 1:57 ` MinChan Kim 0 siblings, 1 reply; 5+ messages in thread From: Lee Schermerhorn @ 2009-02-03 1:48 UTC (permalink / raw) To: MinChan Kim Cc: linux mm, Andrew Morton, linux kernel, Nick Piggin, KOSAKI Motohiro On Tue, 2009-02-03 at 08:27 +0900, MinChan Kim wrote: > On Mon, Feb 02, 2009 at 12:16:35PM -0500, Lee Schermerhorn wrote: > > On Mon, 2009-02-02 at 15:16 +0900, MinChan Kim wrote: > > > When I tested following program, I found that mlocked counter > > > is strange. > > > It couldn't free some mlocked pages of test program. > > > > > > It is caused that try_to_unmap_file don't check real > > > page mapping in vmas. > > > That's because goal of address_space for file is to find all processes > > > into which the file's specific interval is mapped. > > > What I mean is that it's not related page but file's interval. > > > > > > Even if the page isn't really mapping at the vma, it returns > > > SWAP_MLOCK since the vma have VM_LOCKED, then calls > > > try_to_mlock_page. After all, mlocked counter is increased again. > > > > > > This patch is based on 2.6.28-rc2-mm1. > > > > > > -- my test program -- > > > > > > #include <stdio.h> > > > #include <sys/mman.h> > > > int main() > > > { > > > mlockall(MCL_CURRENT); > > > return 0; > > > } > > > > > > -- before -- > > > > > > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' > > > Unevictable: 0 kB > > > Mlocked: 0 kB > > > > > > -- after -- > > > > > > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' > > > Unevictable: 8 kB > > > Mlocked: 8 kB > > > > > > > > > -- > > > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > > index 1099394..9ba1fdf 100644 > > > --- a/mm/rmap.c > > > +++ b/mm/rmap.c > > > @@ -1073,6 +1073,9 @@ static int try_to_unmap_file(struct page *page, int unlock, int migration) > > > unsigned long max_nl_size = 0; > > > unsigned int mapcount; > > > unsigned int mlocked = 0; > > > + unsigned long address; > > > + pte_t *pte; > > > + spinlock_t *ptl; > > > > > > if (MLOCK_PAGES && unlikely(unlock)) > > > ret = SWAP_SUCCESS; /* default for try_to_munlock() */ > > > @@ -1089,6 +1092,13 @@ static int try_to_unmap_file(struct page *page, int unlock, int migration) > > > goto out; > > > } > > > if (ret == SWAP_MLOCK) { > > > + address = vma_address(page, vma); > > > + if (address != -EFAULT) { > > > + pte = page_check_address(page, vma->vm_mm, address, &ptl, 0); > > > + if (!pte) > > > + continue; > > > + pte_unmap_unlock(pte, ptl); > > > + } > > > mlocked = try_to_mlock_page(page, vma); > > > if (mlocked) > > > break; /* stop if actually mlocked page */ > > > > Hi, MinChan: > > > > Interestingly, Rik had addressed this [simpler patch below] way back > > when he added the page_mapped_in_vma() function. I asked him whether > > > > the rb tree shouldn't have filtered any vmas that didn't have the page > > mapped. He agreed and removed the check from try_to_unmap_file(). > > Guess I can be very convincing, even when I'm wrong [happening a lot > > lately]. Of course, in this instance, the rb-tree filtering only works > > for shared, page-cache pages. The problem uncovered by your test case > > It's not rb-tree but priority tree. ;-) Yeah, that one :). > > > is with a COWed anon page in a file-backed vma. Yes, the vma 'maps' the > > virtual address range containing the page in question, but since it's a > > private COWed anon page, it isn't necessarily "mapped" in the VM_LOCKED > > vma's mm's page table. We need the check... > > Indeed! > > > > > I've added the variant below [CURRENTLY UNTESTED] to my test tree. > > > > Lee > > > > [intentionally omitted sign off, until tested.] > > I shouldn't forgot page_mapped_in_vma. > However, It looks good to me. > Thank you for testing. I did test this on 29-rc3 on a 4 socket x dual core x86_64 platform and it seems to resolve the statistics miscount. How do you want to proceed. Do you want to repost with this version of patch? Or shall I? Regards, Lee > > > > > > Index: linux-2.6.29-rc3/mm/rmap.c > > =================================================================== > > --- linux-2.6.29-rc3.orig/mm/rmap.c 2009-01-30 14:13:56.000000000 -0500 > > +++ linux-2.6.29-rc3/mm/rmap.c 2009-02-02 11:27:11.000000000 -0500 > > @@ -1072,7 +1072,8 @@ static int try_to_unmap_file(struct page > > spin_lock(&mapping->i_mmap_lock); > > vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { > > if (MLOCK_PAGES && unlikely(unlock)) { > > - if (!(vma->vm_flags & VM_LOCKED)) > > + if (!((vma->vm_flags & VM_LOCKED) && > > + page_mapped_in_vma(page, vma))) > > continue; /* must visit all vmas */ > > ret = SWAP_MLOCK; > > } else { > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fix mlocked page counter mismatch 2009-02-03 1:48 ` Lee Schermerhorn @ 2009-02-03 1:57 ` MinChan Kim 0 siblings, 0 replies; 5+ messages in thread From: MinChan Kim @ 2009-02-03 1:57 UTC (permalink / raw) To: Lee Schermerhorn Cc: linux mm, Andrew Morton, linux kernel, Nick Piggin, KOSAKI Motohiro > > Thank you for testing. > > I did test this on 29-rc3 on a 4 socket x dual core x86_64 platform and > it seems to resolve the statistics miscount. How do you want to > proceed. Do you want to repost with this version of patch? Or shall I? I will repost this patch with your ACK and tested-by if you allow it. > > Regards, > Lee > > > > > > > > > > Index: linux-2.6.29-rc3/mm/rmap.c > > > =================================================================== > > > --- linux-2.6.29-rc3.orig/mm/rmap.c 2009-01-30 14:13:56.000000000 -0500 > > > +++ linux-2.6.29-rc3/mm/rmap.c 2009-02-02 11:27:11.000000000 -0500 > > > @@ -1072,7 +1072,8 @@ static int try_to_unmap_file(struct page > > > spin_lock(&mapping->i_mmap_lock); > > > vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { > > > if (MLOCK_PAGES && unlikely(unlock)) { > > > - if (!(vma->vm_flags & VM_LOCKED)) > > > + if (!((vma->vm_flags & VM_LOCKED) && > > > + page_mapped_in_vma(page, vma))) > > > continue; /* must visit all vmas */ > > > ret = SWAP_MLOCK; > > > } else { > > > > > -- Kinds Regards MinChan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-02-03 1:58 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-02-02 6:16 [PATCH] fix mlocked page counter mismatch MinChan Kim 2009-02-02 17:16 ` Lee Schermerhorn 2009-02-02 23:27 ` MinChan Kim 2009-02-03 1:48 ` Lee Schermerhorn 2009-02-03 1:57 ` MinChan Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox