* [BUG] mlocked page counter mismatch
@ 2009-01-28 10:28 MinChan Kim
2009-01-28 14:38 ` KOSAKI Motohiro
2009-01-28 15:33 ` Lee Schermerhorn
0 siblings, 2 replies; 9+ messages in thread
From: MinChan Kim @ 2009-01-28 10:28 UTC (permalink / raw)
To: linux mm; +Cc: linux kernel, Lee Schermerhorn, Nick Piggin
After executing following program, 'cat /proc/meminfo' shows
following result.
--
# cat /proc/meminfo
..
Unevictable: 8 kB
Mlocked: 8 kB
..
-- test program --
#include <stdio.h>
#include <sys/mman.h>
int main()
{
char buf[64] = {0,};
char *ptr;
int k;
int i,j;
int x,y;
mlockall(MCL_CURRENT);
sprintf(buf, "cat /proc/%d/maps", getpid());
system(buf);
return 0;
}
--
It seems mlocked page counter have a problem.
After I diged in source, I found that try_to_unmap_file called
try_to_mlock_page about shared mapping pages
since other vma had VM_LOCKED flag.
After all, try_to_mlock_page called mlock_vma_page.
so, mlocked counter increased
But, After I called munlockall intentionally, the counter work well.
In case of munlockall, we already had a mmap_sem about write.
Such a case, try_to_mlock_page can't call mlock_vma_page.
so, mlocked counter didn't increased.
As a result, the counter seems to be work well but I think
it also have a problem.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [BUG] mlocked page counter mismatch 2009-01-28 10:28 [BUG] mlocked page counter mismatch MinChan Kim @ 2009-01-28 14:38 ` KOSAKI Motohiro 2009-01-28 15:33 ` Lee Schermerhorn 1 sibling, 0 replies; 9+ messages in thread From: KOSAKI Motohiro @ 2009-01-28 14:38 UTC (permalink / raw) To: MinChan Kim; +Cc: linux mm, linux kernel, Lee Schermerhorn, Nick Piggin 2009/1/28 MinChan Kim <minchan.kim@gmail.com>: > > After executing following program, 'cat /proc/meminfo' shows > following result. > > -- > # cat /proc/meminfo > .. > Unevictable: 8 kB > Mlocked: 8 kB > .. ok, I'll hand this bug. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] mlocked page counter mismatch 2009-01-28 10:28 [BUG] mlocked page counter mismatch MinChan Kim 2009-01-28 14:38 ` KOSAKI Motohiro @ 2009-01-28 15:33 ` Lee Schermerhorn 2009-01-28 23:55 ` MinChan Kim 1 sibling, 1 reply; 9+ messages in thread From: Lee Schermerhorn @ 2009-01-28 15:33 UTC (permalink / raw) To: MinChan Kim; +Cc: linux mm, linux kernel, Nick Piggin, KOSAKI Motohiro On Wed, 2009-01-28 at 19:28 +0900, MinChan Kim wrote: > After executing following program, 'cat /proc/meminfo' shows > following result. > > -- > # cat /proc/meminfo > .. > Unevictable: 8 kB > Mlocked: 8 kB > .. Sorry, from your description, I can't understand what the problem is. Are you saying that 8kB [2 pages] remains locked after you run your test? What did meminfo show before running the test program? And what kernel version? > > -- test program -- > > #include <stdio.h> > #include <sys/mman.h> > int main() > { > char buf[64] = {0,}; > char *ptr; > int k; > int i,j; > int x,y; > mlockall(MCL_CURRENT); > sprintf(buf, "cat /proc/%d/maps", getpid()); > system(buf); > return 0; > } > > -- > > It seems mlocked page counter have a problem. > After I diged in source, I found that try_to_unmap_file called > try_to_mlock_page about shared mapping pages > since other vma had VM_LOCKED flag. > After all, try_to_mlock_page called mlock_vma_page. > so, mlocked counter increased This path of try_to_unmap_file() -> try_to_mlock_page() should only be invoked during reclaim--from shrink_page_list(). [try_to_unmap() is also called from page migration, but in this case, try_to_unmap_one() won't return SWAP_MLOCK so we don't call try_to_mlock_page().] Unless your system is in continuous reclaim, I don't think you'd hit this during your test program. > > But, After I called munlockall intentionally, the counter work well. > In case of munlockall, we already had a mmap_sem about write. > Such a case, try_to_mlock_page can't call mlock_vma_page. > so, mlocked counter didn't increased. > As a result, the counter seems to be work well but I think > it also have a problem. I THINK this is a artifact of the way stats are accumulated in per cpu differential counters and pushed to the zone state accumulators when a threshold is reached. I've seen this condition before, but it eventually clears itself as the stats get pushed to the zone state. Still, it bears more investigation, as it's been a while since I worked on this and some subsequent fixes could have broken it: I ran your test program on one of our x86_64 test systems running 2.6.29-rc2-mmotm-090116-1618, immediately after boot. Here's what I saw: ## before: #cat /proc/meminfo | egrep 'Unev|Mlo' Unevictable: 3448 kB Mlocked: 3448 kB # = 862 4k pages This is the usual case on this platform. I'm using a RHEL5 distro installation and one of the system daemons mlocks itself. I forget which. I'll need to investigate further. Also, it's possible that this value itself is lower than the actual number of mlocked pages because some of the counts may still be in the per cpu differential counters. #tail -8 /proc/vmstat unevictable_pgs_culled 738 unevictable_pgs_scanned 0 unevictable_pgs_rescued 117 # = 89 + 28 - OK unevictable_pgs_mlocked 979 # 979 - 117 = 862 remaining locked unevictable_pgs_munlocked 89 unevictable_pgs_cleared 28 unevictable_pgs_stranded 0 unevictable_pgs_mlockfreed 0 So far, so good. Now, run your test: #./mck-mlock-test <snip the map output> #cat /proc/meminfo | egrep 'Unev|Mlo' Unevictable: 3460 kB Mlocked: 3460 kB # = 865 pages; 3 more than above # tail -8 /proc/vmstat unevictable_pgs_culled 757 unevictable_pgs_scanned 0 unevictable_pgs_rescued 154 # = 125 + 29 - OK unevictable_pgs_mlocked 1374 # 1374 - 154 = 1220 ???? unevictable_pgs_munlocked 125 unevictable_pgs_cleared 29 unevictable_pgs_stranded 0 unevictable_pgs_mlockfreed 0 So, we have 3 additional pages shown as unevictable; and our stats don't add up. We see way more pages mlocked than munlocked/cleared. [Aside: clear happens on file truncation and COW of mlocked pages.] I wonder if this is the result of removing some of the lru_add_drain_all() calls that we used to have in the mlock code to improve the statistics. We don't seem to have stranded any pages--that is, left them unevictable because we couldn't isolate them from the lru for munlock. If I drop caches, or run a moderately heavy mlock test--both of which generate quite a bit of zone and vmstat activity--the meminfo values become: Unevictable: 3456 kB Mlocked: 3456 kB Which is two more mlocked pages than we saw right after boot. If I rerun your test repeatedly, the values always show as 3460kB. Dropping caches always restores it to 3456kB. This may be the correct value with the per cpu differential values pushed. You could try dropping the page cache and see what the values are on your system. You could also add the following line to your test program before the call to mlockall() and after the existing call to system(): system("cat /proc/meminfo | egrep 'Unev|Mlo'"); I will add this to my list of things to be investigated, but I won't get to it for a while. If I see more evidence that the counters are, indeed, broken, I'll try to bump the priority. Thanks, Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] mlocked page counter mismatch 2009-01-28 15:33 ` Lee Schermerhorn @ 2009-01-28 23:55 ` MinChan Kim 2009-01-28 23:57 ` MinChan Kim 2009-01-29 1:48 ` Lee Schermerhorn 0 siblings, 2 replies; 9+ messages in thread From: MinChan Kim @ 2009-01-28 23:55 UTC (permalink / raw) To: Lee Schermerhorn; +Cc: linux mm, linux kernel, Nick Piggin, KOSAKI Motohiro On Wed, Jan 28, 2009 at 10:33:52AM -0500, Lee Schermerhorn wrote: > On Wed, 2009-01-28 at 19:28 +0900, MinChan Kim wrote: > > After executing following program, 'cat /proc/meminfo' shows > > following result. > > > > -- > > # cat /proc/meminfo > > .. > > Unevictable: 8 kB > > Mlocked: 8 kB > > .. > > Sorry, from your description, I can't understand what the problem is. > Are you saying that 8kB [2 pages] remains locked after you run your > test? Yes. Sorry. My explanation was not enought. > > What did meminfo show before running the test program? And what kernel > version? The meminfo showed mlocked, unevictable pages was zero. My kernel version is 2.6.29-rc2. > > > > > -- test program -- > > > > #include <stdio.h> > > #include <sys/mman.h> > > int main() > > { > > char buf[64] = {0,}; > > char *ptr; > > int k; > > int i,j; > > int x,y; > > mlockall(MCL_CURRENT); > > sprintf(buf, "cat /proc/%d/maps", getpid()); > > system(buf); > > return 0; > > } > > > > -- > > > > It seems mlocked page counter have a problem. > > After I diged in source, I found that try_to_unmap_file called > > try_to_mlock_page about shared mapping pages > > since other vma had VM_LOCKED flag. > > After all, try_to_mlock_page called mlock_vma_page. > > so, mlocked counter increased > > This path of try_to_unmap_file() -> try_to_mlock_page() should only be > invoked during reclaim--from shrink_page_list(). [try_to_unmap() is > also called from page migration, but in this case, try_to_unmap_one() > won't return SWAP_MLOCK so we don't call try_to_mlock_page().] Unless > your system is in continuous reclaim, I don't think you'd hit this > during your test program. My system was not reclaim mode. It could be called following path. exit_mmap -> munlock_vma_pages_all->munlock_vma_page->try_to_munlock-> try_to_unmap_file->try_to_mlock_page > > > > > But, After I called munlockall intentionally, the counter work well. > > In case of munlockall, we already had a mmap_sem about write. > > Such a case, try_to_mlock_page can't call mlock_vma_page. > > so, mlocked counter didn't increased. > > As a result, the counter seems to be work well but I think > > it also have a problem. > > I THINK this is a artifact of the way stats are accumulated in per cpu > differential counters and pushed to the zone state accumulators when a > threshold is reached. I've seen this condition before, but it > eventually clears itself as the stats get pushed to the zone state. > Still, it bears more investigation, as it's been a while since I worked > on this and some subsequent fixes could have broken it: Hmm... My test result is as follow. 1) without munlockall before: root@barrios-target-linux:~# tail -8 /proc/vmstat unevictable_pgs_culled 0 unevictable_pgs_scanned 0 unevictable_pgs_rescued 0 unevictable_pgs_mlocked 0 unevictable_pgs_munlocked 0 unevictable_pgs_cleared 0 unevictable_pgs_stranded 0 unevictable_pgs_mlockfreed 0 root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' Unevictable: 0 kB Mlocked: 0 kB after: root@barrios-target-linux:~# tail -8 /proc/vmstat unevictable_pgs_culled 369 unevictable_pgs_scanned 0 unevictable_pgs_rescued 388 unevictable_pgs_mlocked 392 unevictable_pgs_munlocked 387 unevictable_pgs_cleared 1 unevictable_pgs_stranded 0 unevictable_pgs_mlockfreed 0 root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' Unevictable: 8 kB Mlocked: 8 kB after dropping cache root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' Unevictable: 4 kB Mlocked: 4 kB 2) with munlockall barrios-target@barrios-target-linux:~$ tail -8 /proc/vmstat unevictable_pgs_culled 0 unevictable_pgs_scanned 0 unevictable_pgs_rescued 0 unevictable_pgs_mlocked 0 unevictable_pgs_munlocked 0 unevictable_pgs_cleared 0 unevictable_pgs_stranded 0 unevictable_pgs_mlockfreed 0 barrios-target@barrios-target-linux:~$ cat /proc/meminfo | egrep 'Mlo|Unev' Unevictable: 0 kB Mlocked: 0 kB after root@barrios-target-linux:~# tail -8 /proc/vmstat unevictable_pgs_culled 369 unevictable_pgs_scanned 0 unevictable_pgs_rescued 389 unevictable_pgs_mlocked 389 unevictable_pgs_munlocked 389 unevictable_pgs_cleared 0 unevictable_pgs_stranded 0 unevictable_pgs_mlockfreed 0 root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' Unevictable: 0 kB Mlocked: 0 kB Both tests have to show same result. But is didn't. I think it's not per-cpu problem. When I digged in the source, I found that. In case of test without munlockall, try_to_unmap_file calls try_to_mlock_page since some pages are mapped several vmas(I don't know why that pages is shared other vma in same process. One of page which i have seen is test program's first code page[page->index : 0 vma->vm_pgoff : 0]. It was mapped by data vma, too). Other vma have VM_LOCKED. So try_to_unmap_file calls try_to_mlock_page. Then, After calling successful down_read_try_lock call, mlock_vma_page increased mlocked counter again. In case of test with munlockall, try_to_mlock_page's down_read_trylock couldn't be sucessful. That's because munlockall called down_write. At result, try_to_mlock_page cannot call try_to_mlock_page. so, mlocked counter don't increased. I think it's not right. But fortunately Mlocked number is right. :( -- Kinds Regards MinChan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] mlocked page counter mismatch 2009-01-28 23:55 ` MinChan Kim @ 2009-01-28 23:57 ` MinChan Kim 2009-01-29 1:48 ` Lee Schermerhorn 1 sibling, 0 replies; 9+ messages in thread From: MinChan Kim @ 2009-01-28 23:57 UTC (permalink / raw) To: Lee Schermerhorn; +Cc: linux mm, linux kernel, Nick Piggin, KOSAKI Motohiro I missed my test program. #include <stdio.h> #include <sys/mman.h> int main() { mlockall(MCL_CURRENT); // munlockall(); return 0; } -- Kinds regards, MinChan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] mlocked page counter mismatch 2009-01-28 23:55 ` MinChan Kim 2009-01-28 23:57 ` MinChan Kim @ 2009-01-29 1:48 ` Lee Schermerhorn 2009-01-29 4:29 ` MinChan Kim 2009-01-29 12:35 ` KOSAKI Motohiro 1 sibling, 2 replies; 9+ messages in thread From: Lee Schermerhorn @ 2009-01-29 1:48 UTC (permalink / raw) To: MinChan Kim; +Cc: linux mm, linux kernel, Nick Piggin, KOSAKI Motohiro On Thu, 2009-01-29 at 08:55 +0900, MinChan Kim wrote: > On Wed, Jan 28, 2009 at 10:33:52AM -0500, Lee Schermerhorn wrote: > > On Wed, 2009-01-28 at 19:28 +0900, MinChan Kim wrote: > > > After executing following program, 'cat /proc/meminfo' shows > > > following result. > > > > > > -- > > > # cat /proc/meminfo > > > .. > > > Unevictable: 8 kB > > > Mlocked: 8 kB > > > .. > > > > Sorry, from your description, I can't understand what the problem is. > > Are you saying that 8kB [2 pages] remains locked after you run your > > test? > > Yes. > Sorry. My explanation was not enought. > > > > > What did meminfo show before running the test program? And what kernel > > version? > > The meminfo showed mlocked, unevictable pages was zero. > My kernel version is 2.6.29-rc2. OK, thanks. > > > > > > > > > -- test program -- > > > > > > #include <stdio.h> > > > #include <sys/mman.h> > > > int main() > > > { > > > char buf[64] = {0,}; > > > char *ptr; > > > int k; > > > int i,j; > > > int x,y; > > > mlockall(MCL_CURRENT); > > > sprintf(buf, "cat /proc/%d/maps", getpid()); > > > system(buf); > > > return 0; > > > } > > > > > > -- > > > > > > It seems mlocked page counter have a problem. > > > After I diged in source, I found that try_to_unmap_file called > > > try_to_mlock_page about shared mapping pages > > > since other vma had VM_LOCKED flag. > > > After all, try_to_mlock_page called mlock_vma_page. > > > so, mlocked counter increased > > > > This path of try_to_unmap_file() -> try_to_mlock_page() should only be > > invoked during reclaim--from shrink_page_list(). [try_to_unmap() is > > also called from page migration, but in this case, try_to_unmap_one() > > won't return SWAP_MLOCK so we don't call try_to_mlock_page().] Unless > > your system is in continuous reclaim, I don't think you'd hit this > > during your test program. > > My system was not reclaim mode. It could be called following path. > exit_mmap -> munlock_vma_pages_all->munlock_vma_page->try_to_munlock-> > try_to_unmap_file->try_to_mlock_page Ah. Yes. Well, try_to_mlock_page() should only call mlock_vma_page() if some other vma that maps the pages is VM_LOCKED. The vma in the task calling try_to_munlock() should have already cleared VM_LOCKED for the vma. However, we need to ensure that the page is actually mapped in the address range of any VM_LOCKED vma. I recall that Rik discovered this back during testing and fixed it, but perhaps it was another path. Looks at code again.... I think I see it. In try_to_unmap_anon(), called from try_to_munlock(), we have: list_for_each_entry(vma, &anon_vma->head, anon_vma_node) { if (MLOCK_PAGES && unlikely(unlock)) { if (!((vma->vm_flags & VM_LOCKED) && !!! should be '||' ? ^^ page_mapped_in_vma(page, vma))) continue; /* must visit all unlocked vmas */ ret = SWAP_MLOCK; /* saw at least one mlocked vma */ } else { ret = try_to_unmap_one(page, vma, migration); if (ret == SWAP_FAIL || !page_mapped(page)) break; } if (ret == SWAP_MLOCK) { mlocked = try_to_mlock_page(page, vma); if (mlocked) break; /* stop if actually mlocked page */ } } or that clause [under if (MLOCK_PAGES && unlikely(unlock))] might be clearer as: if ((vma->vm_flags & VM_LOCKED) && page_mapped_in_vma(page, vma)) ret = SWAP_MLOCK; /* saw at least one mlocked vma */ else continue; /* must visit all unlocked vmas */ Do you agree? And, I wonder if we need a similar check for page_mapped_in_vma(page, vma) up in try_to_unmap_one()? > > > > > > > > > But, After I called munlockall intentionally, the counter work well. > > > In case of munlockall, we already had a mmap_sem about write. > > > Such a case, try_to_mlock_page can't call mlock_vma_page. > > > so, mlocked counter didn't increased. > > > As a result, the counter seems to be work well but I think > > > it also have a problem. > > > > I THINK this is a artifact of the way stats are accumulated in per cpu > > differential counters and pushed to the zone state accumulators when a > > threshold is reached. I've seen this condition before, but it > > eventually clears itself as the stats get pushed to the zone state. > > Still, it bears more investigation, as it's been a while since I worked > > on this and some subsequent fixes could have broken it: > > Hmm... My test result is as follow. > > 1) without munlockall > before: > > root@barrios-target-linux:~# tail -8 /proc/vmstat > unevictable_pgs_culled 0 > unevictable_pgs_scanned 0 > unevictable_pgs_rescued 0 > unevictable_pgs_mlocked 0 > unevictable_pgs_munlocked 0 > unevictable_pgs_cleared 0 > unevictable_pgs_stranded 0 > unevictable_pgs_mlockfreed 0 > > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' > Unevictable: 0 kB > Mlocked: 0 kB > > after: > root@barrios-target-linux:~# tail -8 /proc/vmstat > unevictable_pgs_culled 369 > unevictable_pgs_scanned 0 > unevictable_pgs_rescued 388 > unevictable_pgs_mlocked 392 > unevictable_pgs_munlocked 387 > unevictable_pgs_cleared 1 this looks like either some task forked and COWed an anon page--perhaps a stack page--or truncated a mlocked, mmaped file. > unevictable_pgs_stranded 0 > unevictable_pgs_mlockfreed 0 > > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' > Unevictable: 8 kB > Mlocked: 8 kB > > after dropping cache > > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' > Unevictable: 4 kB > Mlocked: 4 kB Same effect I was seeing. Two extra mlock counts until we drop cache. Then only 1. Interesting. > > > 2) with munlockall > > barrios-target@barrios-target-linux:~$ tail -8 /proc/vmstat > unevictable_pgs_culled 0 > unevictable_pgs_scanned 0 > unevictable_pgs_rescued 0 > unevictable_pgs_mlocked 0 > unevictable_pgs_munlocked 0 > unevictable_pgs_cleared 0 > unevictable_pgs_stranded 0 > unevictable_pgs_mlockfreed 0 > > barrios-target@barrios-target-linux:~$ cat /proc/meminfo | egrep 'Mlo|Unev' > Unevictable: 0 kB > Mlocked: 0 kB > > after > > > root@barrios-target-linux:~# tail -8 /proc/vmstat > unevictable_pgs_culled 369 > unevictable_pgs_scanned 0 > unevictable_pgs_rescued 389 > unevictable_pgs_mlocked 389 > unevictable_pgs_munlocked 389 > unevictable_pgs_cleared 0 > unevictable_pgs_stranded 0 > unevictable_pgs_mlockfreed 0 > > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' > Unevictable: 0 kB > Mlocked: 0 kB > > Both tests have to show same result. > But is didn't. > > I think it's not per-cpu problem. > > When I digged in the source, I found that. > In case of test without munlockall, try_to_unmap_file calls try_to_mlock_page This I don't understand. exit_mmap() calls munlock_vma_pages_all() for all VM_LOCKED vmas. This should have the same effect as calling mlock_fixup() without VM_LOCKED flags, which munlockall() does. > since some pages are mapped several vmas(I don't know why that pages is shared > other vma in same process. Isn't necessarily in the same task. We're traversing the list of vma's associated with a single anon_vma. This includes all ancestors and descendants that haven't exec'd. Of course, I don't see a fork() in either of your test programs, so I don't know what's happening. > One of page which i have seen is test program's > first code page[page->index : 0 vma->vm_pgoff : 0]. It was mapped by data vma, too). > Other vma have VM_LOCKED. > So try_to_unmap_file calls try_to_mlock_page. Then, After calling > successful down_read_try_lock call, mlock_vma_page increased mlocked > counter again. > > In case of test with munlockall, try_to_mlock_page's down_read_trylock > couldn't be sucessful. That's because munlockall called down_write. > At result, try_to_mlock_page cannot call try_to_mlock_page. so, mlocked counter > don't increased. I think it's not right. > But fortunately Mlocked number is right. :( I'll try with your 2nd test program [sent via separate mail] and try the fix above. I also want to understand the difference between exit_mmap() for a task that called mlockall() and the munlockall() case. Regards, Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] mlocked page counter mismatch 2009-01-29 1:48 ` Lee Schermerhorn @ 2009-01-29 4:29 ` MinChan Kim 2009-01-29 12:35 ` KOSAKI Motohiro 1 sibling, 0 replies; 9+ messages in thread From: MinChan Kim @ 2009-01-29 4:29 UTC (permalink / raw) To: Lee Schermerhorn; +Cc: linux mm, linux kernel, Nick Piggin, KOSAKI Motohiro Sorry for late response. > Looks at code again.... > > I think I see it. In try_to_unmap_anon(), called from try_to_munlock(), > we have: > > list_for_each_entry(vma, &anon_vma->head, anon_vma_node) { > if (MLOCK_PAGES && unlikely(unlock)) { > if (!((vma->vm_flags & VM_LOCKED) && > !!! should be '||' ? ^^ > page_mapped_in_vma(page, vma))) > continue; /* must visit all unlocked vmas */ > ret = SWAP_MLOCK; /* saw at least one mlocked vma */ > } else { > ret = try_to_unmap_one(page, vma, migration); > if (ret == SWAP_FAIL || !page_mapped(page)) > break; > } > if (ret == SWAP_MLOCK) { > mlocked = try_to_mlock_page(page, vma); > if (mlocked) > break; /* stop if actually mlocked page */ > } > } > > or that clause [under if (MLOCK_PAGES && unlikely(unlock))] > might be clearer as: > > if ((vma->vm_flags & VM_LOCKED) && page_mapped_in_vma(page, vma)) > ret = SWAP_MLOCK; /* saw at least one mlocked vma */ > else > continue; /* must visit all unlocked vmas */ > > > Do you agree? I agree this. This is more clear. we have to check another process's vma which is linked by anon_vma. We also have to check it in try_to_unmap_file. > > And, I wonder if we need a similar check for > page_mapped_in_vma(page, vma) up in try_to_unmap_one()? > > > > > > > > > > > > > > But, After I called munlockall intentionally, the counter work well. > > > > In case of munlockall, we already had a mmap_sem about write. > > > > Such a case, try_to_mlock_page can't call mlock_vma_page. > > > > so, mlocked counter didn't increased. > > > > As a result, the counter seems to be work well but I think > > > > it also have a problem. > > > > > > I THINK this is a artifact of the way stats are accumulated in per cpu > > > differential counters and pushed to the zone state accumulators when a > > > threshold is reached. I've seen this condition before, but it > > > eventually clears itself as the stats get pushed to the zone state. > > > Still, it bears more investigation, as it's been a while since I worked > > > on this and some subsequent fixes could have broken it: > > > > Hmm... My test result is as follow. > > > > 1) without munlockall > > before: > > > > root@barrios-target-linux:~# tail -8 /proc/vmstat > > unevictable_pgs_culled 0 > > unevictable_pgs_scanned 0 > > unevictable_pgs_rescued 0 > > unevictable_pgs_mlocked 0 > > unevictable_pgs_munlocked 0 > > unevictable_pgs_cleared 0 > > unevictable_pgs_stranded 0 > > unevictable_pgs_mlockfreed 0 > > > > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' > > Unevictable: 0 kB > > Mlocked: 0 kB > > > > after: > > root@barrios-target-linux:~# tail -8 /proc/vmstat > > unevictable_pgs_culled 369 > > unevictable_pgs_scanned 0 > > unevictable_pgs_rescued 388 > > unevictable_pgs_mlocked 392 > > unevictable_pgs_munlocked 387 > > unevictable_pgs_cleared 1 > > this looks like either some task forked and COWed an anon page--perhaps > a stack page--or truncated a mlocked, mmaped file. > > > unevictable_pgs_stranded 0 > > unevictable_pgs_mlockfreed 0 > > > > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' > > Unevictable: 8 kB > > Mlocked: 8 kB > > > > after dropping cache > > > > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' > > Unevictable: 4 kB > > Mlocked: 4 kB > > Same effect I was seeing. Two extra mlock counts until we drop cache. > Then only 1. Interesting. > > > > > > > 2) with munlockall > > > > barrios-target@barrios-target-linux:~$ tail -8 /proc/vmstat > > unevictable_pgs_culled 0 > > unevictable_pgs_scanned 0 > > unevictable_pgs_rescued 0 > > unevictable_pgs_mlocked 0 > > unevictable_pgs_munlocked 0 > > unevictable_pgs_cleared 0 > > unevictable_pgs_stranded 0 > > unevictable_pgs_mlockfreed 0 > > > > barrios-target@barrios-target-linux:~$ cat /proc/meminfo | egrep 'Mlo|Unev' > > Unevictable: 0 kB > > Mlocked: 0 kB > > > > after > > > > > > root@barrios-target-linux:~# tail -8 /proc/vmstat > > unevictable_pgs_culled 369 > > unevictable_pgs_scanned 0 > > unevictable_pgs_rescued 389 > > unevictable_pgs_mlocked 389 > > unevictable_pgs_munlocked 389 > > unevictable_pgs_cleared 0 > > unevictable_pgs_stranded 0 > > unevictable_pgs_mlockfreed 0 > > > > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev' > > Unevictable: 0 kB > > Mlocked: 0 kB > > > > Both tests have to show same result. > > But is didn't. > > > > I think it's not per-cpu problem. > > > > When I digged in the source, I found that. > > In case of test without munlockall, try_to_unmap_file calls try_to_mlock_page > > This I don't understand. exit_mmap() calls munlock_vma_pages_all() for > all VM_LOCKED vmas. This should have the same effect as calling > mlock_fixup() without VM_LOCKED flags, which munlockall() does. I said early. The difference is write of mmap_sem. In case of exit_mmap, it have readlock of mmap_sem. but In case of munlockall, it have writelock of mmap_sem. so try_to_mlock_page will fail down_read_trylock. > > > > since some pages are mapped several vmas(I don't know why that pages is shared > > other vma in same process. > > Isn't necessarily in the same task. We're traversing the list of vma's > associated with a single anon_vma. This includes all ancestors and > descendants that haven't exec'd. Of course, I don't see a fork() in > either of your test programs, so I don't know what's happening. I agree. we have to traverse list of vma's. In my case, my test program's image's first page is mapped two vma. one is code vma. the other is data vma. I don't know why code and data vmas include program's first page. > > > One of page which i have seen is test program's > > first code page[page->index : 0 vma->vm_pgoff : 0]. It was mapped by data vma, too). > > Other vma have VM_LOCKED. > > So try_to_unmap_file calls try_to_mlock_page. Then, After calling > > successful down_read_try_lock call, mlock_vma_page increased mlocked > > counter again. > > > > In case of test with munlockall, try_to_mlock_page's down_read_trylock > > couldn't be sucessful. That's because munlockall called down_write. > > At result, try_to_mlock_page cannot call try_to_mlock_page. so, mlocked counter > > don't increased. I think it's not right. > > But fortunately Mlocked number is right. :( > > > I'll try with your 2nd test program [sent via separate mail] and try the > fix above. I also want to understand the difference between exit_mmap() > for a task that called mlockall() and the munlockall() case. > Thanks for having an interest in this problem. :) > Regards, > Lee -- Kinds Regards MinChan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] mlocked page counter mismatch 2009-01-29 1:48 ` Lee Schermerhorn 2009-01-29 4:29 ` MinChan Kim @ 2009-01-29 12:35 ` KOSAKI Motohiro 2009-01-29 14:44 ` Lee Schermerhorn 1 sibling, 1 reply; 9+ messages in thread From: KOSAKI Motohiro @ 2009-01-29 12:35 UTC (permalink / raw) To: Lee Schermerhorn; +Cc: MinChan Kim, linux mm, linux kernel, Nick Piggin Hi > I think I see it. In try_to_unmap_anon(), called from try_to_munlock(), > we have: > > list_for_each_entry(vma, &anon_vma->head, anon_vma_node) { > if (MLOCK_PAGES && unlikely(unlock)) { > if (!((vma->vm_flags & VM_LOCKED) && > !!! should be '||' ? ^^ > page_mapped_in_vma(page, vma))) > continue; /* must visit all unlocked vmas */ > ret = SWAP_MLOCK; /* saw at least one mlocked vma */ > } else { > ret = try_to_unmap_one(page, vma, migration); > if (ret == SWAP_FAIL || !page_mapped(page)) > break; > } > if (ret == SWAP_MLOCK) { > mlocked = try_to_mlock_page(page, vma); > if (mlocked) > break; /* stop if actually mlocked page */ > } > } > > or that clause [under if (MLOCK_PAGES && unlikely(unlock))] > might be clearer as: > > if ((vma->vm_flags & VM_LOCKED) && page_mapped_in_vma(page, vma)) > ret = SWAP_MLOCK; /* saw at least one mlocked vma */ > else > continue; /* must visit all unlocked vmas */ > > Do you agree? Hmmm. I don't think so. > if (!((vma->vm_flags & VM_LOCKED) && > page_mapped_in_vma(page, vma))) > continue; /* must visit all unlocked vmas */ is already equivalent to > if ((vma->vm_flags & VM_LOCKED) && page_mapped_in_vma(page, vma)) > ret = SWAP_MLOCK; /* saw at least one mlocked vma */ > else > continue; /* must visit all unlocked vmas */ > And, I wonder if we need a similar check for > page_mapped_in_vma(page, vma) up in try_to_unmap_one()? because page_mapped_in_vma() can return 0 if vma is anon vma only. In the other word, struct adress_space (for file) gurantee that unrelated vma doesn't chained. but struct anon_vma (for anon) doesn't gurantee that unrelated vma doesn't chained. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] mlocked page counter mismatch 2009-01-29 12:35 ` KOSAKI Motohiro @ 2009-01-29 14:44 ` Lee Schermerhorn 0 siblings, 0 replies; 9+ messages in thread From: Lee Schermerhorn @ 2009-01-29 14:44 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: MinChan Kim, linux mm, linux kernel, Nick Piggin On Thu, 2009-01-29 at 21:35 +0900, KOSAKI Motohiro wrote: > Hi > > > I think I see it. In try_to_unmap_anon(), called from try_to_munlock(), > > we have: > > > > list_for_each_entry(vma, &anon_vma->head, anon_vma_node) { > > if (MLOCK_PAGES && unlikely(unlock)) { > > if (!((vma->vm_flags & VM_LOCKED) && > > !!! should be '||' ? ^^ > > page_mapped_in_vma(page, vma))) > > continue; /* must visit all unlocked vmas */ > > ret = SWAP_MLOCK; /* saw at least one mlocked vma */ > > } else { > > ret = try_to_unmap_one(page, vma, migration); > > if (ret == SWAP_FAIL || !page_mapped(page)) > > break; > > } > > if (ret == SWAP_MLOCK) { > > mlocked = try_to_mlock_page(page, vma); > > if (mlocked) > > break; /* stop if actually mlocked page */ > > } > > } > > > > or that clause [under if (MLOCK_PAGES && unlikely(unlock))] > > might be clearer as: > > > > if ((vma->vm_flags & VM_LOCKED) && page_mapped_in_vma(page, vma)) > > ret = SWAP_MLOCK; /* saw at least one mlocked vma */ > > else > > continue; /* must visit all unlocked vmas */ > > > > Do you agree? > > Hmmm. > I don't think so. > > > if (!((vma->vm_flags & VM_LOCKED) && > > page_mapped_in_vma(page, vma))) > > continue; /* must visit all unlocked vmas */ > > is already equivalent to > > > if ((vma->vm_flags & VM_LOCKED) && page_mapped_in_vma(page, vma)) > > ret = SWAP_MLOCK; /* saw at least one mlocked vma */ > > else > > > continue; /* must visit all unlocked vmas */ Hmmm, I should know not to try to read code when I'm that sleepy. Had myself convinced that the condition was wrong... > > > > And, I wonder if we need a similar check for > > page_mapped_in_vma(page, vma) up in try_to_unmap_one()? > > because page_mapped_in_vma() can return 0 if vma is anon vma only. by "vma is anon vma only", do you mean that the vma being tested--e.g., that we found to be VM_LOCKED--no longer has the page mapped in it's page tables? That is it's purpose--to detect this condition. IIRC, Rik added it during testing of the mlock patches when we discovered we were mlocking pages because > > In the other word, > struct adress_space (for file) gurantee that unrelated vma doesn't chained. right. that's why we don't have the page_mapped_in_vma() check in try_to_unmap_file(). > but struct anon_vma (for anon) doesn't gurantee that unrelated vma > doesn't chained. Well, they're not exactly "unrelated"--vmas attached to an anon_vma are from the same "family". Any pages that haven't been COWed will still be mapped into multiple mm's. My question last night about try_to_unmap_one() wasn't really related to the mlock statistics glitch. Sorry I wasn't more clear about this. I was wondering about the case where shrink_page_list was trying to unmap a page whose vma was on an anon_vma list with other VM_LOCKED vmas that didn't actually map the page. However, in the early morning light, I see that the call to page_check_address() handles this. ---------- Anyway, our original responses to this report crossed in the mail. You said you'd handle it. So, in the meantime, I'm looking at the mmap()/vm_merge()/mlock_vma_pages_range() issue reported yesterday. Regards, Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-01-29 14:44 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-01-28 10:28 [BUG] mlocked page counter mismatch MinChan Kim 2009-01-28 14:38 ` KOSAKI Motohiro 2009-01-28 15:33 ` Lee Schermerhorn 2009-01-28 23:55 ` MinChan Kim 2009-01-28 23:57 ` MinChan Kim 2009-01-29 1:48 ` Lee Schermerhorn 2009-01-29 4:29 ` MinChan Kim 2009-01-29 12:35 ` KOSAKI Motohiro 2009-01-29 14:44 ` Lee Schermerhorn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox