[BUG] mlocked page counter mismatch

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [BUG] mlocked page counter mismatch
@ 2009-01-28 10:28 MinChan Kim
  2009-01-28 14:38 ` KOSAKI Motohiro
  2009-01-28 15:33 ` Lee Schermerhorn
  0 siblings, 2 replies; 9+ messages in thread
From: MinChan Kim @ 2009-01-28 10:28 UTC (permalink / raw)
  To: linux mm; +Cc: linux kernel, Lee Schermerhorn, Nick Piggin

After executing following program, 'cat /proc/meminfo' shows
following result. 

--
# cat /proc/meminfo 
..
Unevictable:           8 kB
Mlocked:               8 kB
..

-- test program --

#include <stdio.h>
#include <sys/mman.h>
int main()
{
        char buf[64] = {0,};
        char *ptr;
        int k;
        int i,j;
        int x,y;
        mlockall(MCL_CURRENT);
        sprintf(buf, "cat /proc/%d/maps", getpid());
        system(buf);
        return 0;
}

--

It seems mlocked page counter have a problem.
After I diged in source, I found that try_to_unmap_file called 
try_to_mlock_page about shared mapping pages 
since other vma had VM_LOCKED flag.
After all, try_to_mlock_page called mlock_vma_page. 
so, mlocked counter increased

But, After I called munlockall intentionally, the counter work well. 
In case of munlockall, we already had a mmap_sem about write. 
Such a case, try_to_mlock_page can't call mlock_vma_page.
so, mlocked counter didn't increased. 
As a result, the counter seems to be work well but I think 
it also have a problem.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] mlocked page counter mismatch
  2009-01-28 10:28 [BUG] mlocked page counter mismatch MinChan Kim
@ 2009-01-28 14:38 ` KOSAKI Motohiro
  2009-01-28 15:33 ` Lee Schermerhorn
  1 sibling, 0 replies; 9+ messages in thread
From: KOSAKI Motohiro @ 2009-01-28 14:38 UTC (permalink / raw)
  To: MinChan Kim; +Cc: linux mm, linux kernel, Lee Schermerhorn, Nick Piggin

2009/1/28 MinChan Kim <minchan.kim@gmail.com>:
>
> After executing following program, 'cat /proc/meminfo' shows
> following result.
>
> --
> # cat /proc/meminfo
> ..
> Unevictable:           8 kB
> Mlocked:               8 kB
> ..

ok, I'll hand this bug.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] mlocked page counter mismatch
  2009-01-28 10:28 [BUG] mlocked page counter mismatch MinChan Kim
  2009-01-28 14:38 ` KOSAKI Motohiro
@ 2009-01-28 15:33 ` Lee Schermerhorn
  2009-01-28 23:55   ` MinChan Kim
  1 sibling, 1 reply; 9+ messages in thread
From: Lee Schermerhorn @ 2009-01-28 15:33 UTC (permalink / raw)
  To: MinChan Kim; +Cc: linux mm, linux kernel, Nick Piggin, KOSAKI Motohiro

On Wed, 2009-01-28 at 19:28 +0900, MinChan Kim wrote:
> After executing following program, 'cat /proc/meminfo' shows
> following result. 
> 
> --
> # cat /proc/meminfo 
> ..
> Unevictable:           8 kB
> Mlocked:               8 kB
> ..

Sorry, from your description, I can't understand what the problem is.
Are you saying that 8kB [2 pages] remains locked after you run your
test?

What did meminfo show before running the test program?  And what kernel
version?

> 
> -- test program --
> 
> #include <stdio.h>
> #include <sys/mman.h>
> int main()
> {
>         char buf[64] = {0,};
>         char *ptr;
>         int k;
>         int i,j;
>         int x,y;
>         mlockall(MCL_CURRENT);
>         sprintf(buf, "cat /proc/%d/maps", getpid());
>         system(buf);
>         return 0;
> }
> 
> --
> 
> It seems mlocked page counter have a problem.
> After I diged in source, I found that try_to_unmap_file called 
> try_to_mlock_page about shared mapping pages 
> since other vma had VM_LOCKED flag.
> After all, try_to_mlock_page called mlock_vma_page. 
> so, mlocked counter increased

This path of try_to_unmap_file() -> try_to_mlock_page() should only be
invoked during reclaim--from shrink_page_list().  [try_to_unmap() is
also called from page migration, but in this case, try_to_unmap_one()
won't return SWAP_MLOCK so we don't call try_to_mlock_page().]  Unless
your system is in continuous reclaim, I don't think you'd hit this
during your test program.

> 
> But, After I called munlockall intentionally, the counter work well. 
> In case of munlockall, we already had a mmap_sem about write. 
> Such a case, try_to_mlock_page can't call mlock_vma_page.
> so, mlocked counter didn't increased. 
> As a result, the counter seems to be work well but I think 
> it also have a problem.

I THINK this is a artifact of the way stats are accumulated in per cpu
differential counters and pushed to the zone state accumulators when a
threshold is reached.  I've seen this condition before, but it
eventually clears itself as the stats get pushed to the zone state.
Still, it bears more investigation, as it's been a while since I worked
on this and some subsequent fixes could have broken it:

I ran your test program on one of our x86_64 test systems running
2.6.29-rc2-mmotm-090116-1618, immediately after boot.  Here's what I
saw:

## before:
#cat /proc/meminfo | egrep 'Unev|Mlo'
Unevictable:        3448 kB
Mlocked:            3448 kB	# = 862 4k pages

This is the usual case on this platform. I'm using a RHEL5 distro
installation and one of the system daemons mlocks itself.  I forget
which.  I'll need to investigate further.  Also, it's possible that this
value itself is lower than the actual number of mlocked pages because
some of the counts may still be in the per cpu differential counters.

#tail -8 /proc/vmstat
unevictable_pgs_culled 738
unevictable_pgs_scanned 0
unevictable_pgs_rescued 117  # = 89 + 28 - OK
unevictable_pgs_mlocked 979  # 979 - 117 = 862 remaining locked
unevictable_pgs_munlocked 89
unevictable_pgs_cleared 28
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0

So far, so good.  Now, run your test:

#./mck-mlock-test
<snip the map output>
#cat /proc/meminfo | egrep 'Unev|Mlo'
Unevictable:        3460 kB
Mlocked:            3460 kB	# = 865 pages;  3 more than above

# tail -8 /proc/vmstat
unevictable_pgs_culled 757
unevictable_pgs_scanned 0
unevictable_pgs_rescued 154	# = 125 + 29 - OK
unevictable_pgs_mlocked 1374	# 1374 - 154 = 1220 ???? 
unevictable_pgs_munlocked 125
unevictable_pgs_cleared 29
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0

So, we have 3 additional pages shown as unevictable; and our stats don't
add up.  We see way more pages mlocked than munlocked/cleared.  [Aside:
clear happens on file truncation and COW of mlocked pages.]  I wonder if
this is the result of removing some of the lru_add_drain_all() calls
that we used to have in the mlock code to improve the statistics.  We
don't seem to have stranded any pages--that is, left them unevictable
because we couldn't isolate them from the lru for munlock.

If I drop caches, or run a moderately heavy mlock test--both of which
generate quite a bit of zone and vmstat activity--the meminfo values
become:

Unevictable:        3456 kB
Mlocked:            3456 kB

Which is two more mlocked pages than we saw right after boot.  If I
rerun your test repeatedly, the values always show as 3460kB.  Dropping
caches always restores it to 3456kB.  This may be the correct value with
the per cpu differential values pushed.  You could try dropping the page
cache and see what the values are on your system.  You could also add
the following line to your test program before the call to mlockall()
and after the existing call to system():

	system("cat /proc/meminfo | egrep 'Unev|Mlo'");

I will add this to my list of things to be investigated, but I won't get
to it for a while.  If I see more evidence that the counters are,
indeed, broken, I'll try to bump the priority.

Thanks,
Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] mlocked page counter mismatch
  2009-01-28 15:33 ` Lee Schermerhorn
@ 2009-01-28 23:55   ` MinChan Kim
  2009-01-28 23:57     ` MinChan Kim
  2009-01-29  1:48     ` Lee Schermerhorn
  0 siblings, 2 replies; 9+ messages in thread
From: MinChan Kim @ 2009-01-28 23:55 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: linux mm, linux kernel, Nick Piggin, KOSAKI Motohiro

On Wed, Jan 28, 2009 at 10:33:52AM -0500, Lee Schermerhorn wrote:
> On Wed, 2009-01-28 at 19:28 +0900, MinChan Kim wrote:
> > After executing following program, 'cat /proc/meminfo' shows
> > following result. 
> > 
> > --
> > # cat /proc/meminfo 
> > ..
> > Unevictable:           8 kB
> > Mlocked:               8 kB
> > ..
> 
> Sorry, from your description, I can't understand what the problem is.
> Are you saying that 8kB [2 pages] remains locked after you run your
> test?

Yes. 
Sorry. My explanation was not enought. 

> 
> What did meminfo show before running the test program?  And what kernel
> version?

The meminfo showed mlocked, unevictable pages was zero. 
My kernel version is 2.6.29-rc2. 

> 
> > 
> > -- test program --
> > 
> > #include <stdio.h>
> > #include <sys/mman.h>
> > int main()
> > {
> >         char buf[64] = {0,};
> >         char *ptr;
> >         int k;
> >         int i,j;
> >         int x,y;
> >         mlockall(MCL_CURRENT);
> >         sprintf(buf, "cat /proc/%d/maps", getpid());
> >         system(buf);
> >         return 0;
> > }
> > 
> > --
> > 
> > It seems mlocked page counter have a problem.
> > After I diged in source, I found that try_to_unmap_file called 
> > try_to_mlock_page about shared mapping pages 
> > since other vma had VM_LOCKED flag.
> > After all, try_to_mlock_page called mlock_vma_page. 
> > so, mlocked counter increased
> 
> This path of try_to_unmap_file() -> try_to_mlock_page() should only be
> invoked during reclaim--from shrink_page_list().  [try_to_unmap() is
> also called from page migration, but in this case, try_to_unmap_one()
> won't return SWAP_MLOCK so we don't call try_to_mlock_page().]  Unless
> your system is in continuous reclaim, I don't think you'd hit this
> during your test program.

My system was not reclaim mode. It could be called following path. 
exit_mmap -> munlock_vma_pages_all->munlock_vma_page->try_to_munlock->
try_to_unmap_file->try_to_mlock_page

> 
> > 
> > But, After I called munlockall intentionally, the counter work well. 
> > In case of munlockall, we already had a mmap_sem about write. 
> > Such a case, try_to_mlock_page can't call mlock_vma_page.
> > so, mlocked counter didn't increased. 
> > As a result, the counter seems to be work well but I think 
> > it also have a problem.
> 
> I THINK this is a artifact of the way stats are accumulated in per cpu
> differential counters and pushed to the zone state accumulators when a
> threshold is reached.  I've seen this condition before, but it
> eventually clears itself as the stats get pushed to the zone state.
> Still, it bears more investigation, as it's been a while since I worked
> on this and some subsequent fixes could have broken it:

Hmm... My test result is as follow. 

1) without munlockall
before:

root@barrios-target-linux:~# tail -8 /proc/vmstat 
unevictable_pgs_culled 0
unevictable_pgs_scanned 0
unevictable_pgs_rescued 0
unevictable_pgs_mlocked 0
unevictable_pgs_munlocked 0
unevictable_pgs_cleared 0
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0

root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
Unevictable:           0 kB
Mlocked:               0 kB

after:
root@barrios-target-linux:~# tail -8 /proc/vmstat 
unevictable_pgs_culled 369
unevictable_pgs_scanned 0
unevictable_pgs_rescued 388
unevictable_pgs_mlocked 392
unevictable_pgs_munlocked 387
unevictable_pgs_cleared 1
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0

root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
Unevictable:           8 kB
Mlocked:               8 kB

after dropping cache

root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
Unevictable:           4 kB
Mlocked:               4 kB


2) with munlockall 

barrios-target@barrios-target-linux:~$ tail -8 /proc/vmstat
unevictable_pgs_culled 0
unevictable_pgs_scanned 0
unevictable_pgs_rescued 0
unevictable_pgs_mlocked 0
unevictable_pgs_munlocked 0
unevictable_pgs_cleared 0
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0

barrios-target@barrios-target-linux:~$ cat /proc/meminfo | egrep 'Mlo|Unev'
Unevictable:           0 kB
Mlocked:               0 kB

after


root@barrios-target-linux:~# tail -8 /proc/vmstat
unevictable_pgs_culled 369
unevictable_pgs_scanned 0
unevictable_pgs_rescued 389
unevictable_pgs_mlocked 389
unevictable_pgs_munlocked 389
unevictable_pgs_cleared 0
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0

root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
Unevictable:           0 kB
Mlocked:               0 kB

Both tests have to show same result. 
But is didn't. 

I think it's not per-cpu problem. 

When I digged in the source, I found that. 
In case of test without munlockall, try_to_unmap_file calls try_to_mlock_page 
since some pages are mapped several vmas(I don't know why that pages is shared 
other vma in same process. One of page which i have seen is test program's 
first code page[page->index : 0 vma->vm_pgoff : 0]. It was mapped by data vma, too). 
Other vma have VM_LOCKED.
So try_to_unmap_file calls try_to_mlock_page. Then, After calling 
successful down_read_try_lock call, mlock_vma_page increased mlocked
counter again. 

In case of test with munlockall, try_to_mlock_page's down_read_trylock 
couldn't be sucessful. That's because munlockall called down_write.
At result, try_to_mlock_page cannot call try_to_mlock_page. so, mlocked counter 
don't increased. I think it's not right. 
But fortunately Mlocked number is right. :(

--
Kinds Regards
MinChan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] mlocked page counter mismatch
  2009-01-28 23:55   ` MinChan Kim
@ 2009-01-28 23:57     ` MinChan Kim
  2009-01-29  1:48     ` Lee Schermerhorn
  1 sibling, 0 replies; 9+ messages in thread
From: MinChan Kim @ 2009-01-28 23:57 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: linux mm, linux kernel, Nick Piggin, KOSAKI Motohiro

I missed my test program.

#include <stdio.h>
#include <sys/mman.h>
int main()
{
        mlockall(MCL_CURRENT);
        // munlockall();
        return 0;
}


-- 
Kinds regards,
MinChan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] mlocked page counter mismatch
  2009-01-28 23:55   ` MinChan Kim
  2009-01-28 23:57     ` MinChan Kim
@ 2009-01-29  1:48     ` Lee Schermerhorn
  2009-01-29  4:29       ` MinChan Kim
  2009-01-29 12:35       ` KOSAKI Motohiro
  1 sibling, 2 replies; 9+ messages in thread
From: Lee Schermerhorn @ 2009-01-29  1:48 UTC (permalink / raw)
  To: MinChan Kim; +Cc: linux mm, linux kernel, Nick Piggin, KOSAKI Motohiro

On Thu, 2009-01-29 at 08:55 +0900, MinChan Kim wrote:
> On Wed, Jan 28, 2009 at 10:33:52AM -0500, Lee Schermerhorn wrote:
> > On Wed, 2009-01-28 at 19:28 +0900, MinChan Kim wrote:
> > > After executing following program, 'cat /proc/meminfo' shows
> > > following result. 
> > > 
> > > --
> > > # cat /proc/meminfo 
> > > ..
> > > Unevictable:           8 kB
> > > Mlocked:               8 kB
> > > ..
> > 
> > Sorry, from your description, I can't understand what the problem is.
> > Are you saying that 8kB [2 pages] remains locked after you run your
> > test?
> 
> Yes. 
> Sorry. My explanation was not enought. 
> 
> > 
> > What did meminfo show before running the test program?  And what kernel
> > version?
> 
> The meminfo showed mlocked, unevictable pages was zero. 
> My kernel version is 2.6.29-rc2. 

OK, thanks.
> 
> > 
> > > 
> > > -- test program --
> > > 
> > > #include <stdio.h>
> > > #include <sys/mman.h>
> > > int main()
> > > {
> > >         char buf[64] = {0,};
> > >         char *ptr;
> > >         int k;
> > >         int i,j;
> > >         int x,y;
> > >         mlockall(MCL_CURRENT);
> > >         sprintf(buf, "cat /proc/%d/maps", getpid());
> > >         system(buf);
> > >         return 0;
> > > }
> > > 
> > > --
> > > 
> > > It seems mlocked page counter have a problem.
> > > After I diged in source, I found that try_to_unmap_file called 
> > > try_to_mlock_page about shared mapping pages 
> > > since other vma had VM_LOCKED flag.
> > > After all, try_to_mlock_page called mlock_vma_page. 
> > > so, mlocked counter increased
> > 
> > This path of try_to_unmap_file() -> try_to_mlock_page() should only be
> > invoked during reclaim--from shrink_page_list().  [try_to_unmap() is
> > also called from page migration, but in this case, try_to_unmap_one()
> > won't return SWAP_MLOCK so we don't call try_to_mlock_page().]  Unless
> > your system is in continuous reclaim, I don't think you'd hit this
> > during your test program.
> 
> My system was not reclaim mode. It could be called following path. 
> exit_mmap -> munlock_vma_pages_all->munlock_vma_page->try_to_munlock->
> try_to_unmap_file->try_to_mlock_page

Ah.  Yes.  Well, try_to_mlock_page() should only call mlock_vma_page()
if some other vma that maps the pages is VM_LOCKED.  The vma in the task
calling try_to_munlock() should have already cleared VM_LOCKED for the
vma.  However, we need to ensure that the page is actually mapped in the
address range of any VM_LOCKED vma.  I recall that Rik discovered this
back during testing and fixed it, but perhaps it was another path.  

Looks at code again....

I think I see it.  In try_to_unmap_anon(), called from try_to_munlock(),
we have:

         list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
                if (MLOCK_PAGES && unlikely(unlock)) {
                        if (!((vma->vm_flags & VM_LOCKED) &&
!!! should be '||' ?                                      ^^
                              page_mapped_in_vma(page, vma)))
                                continue;  /* must visit all unlocked vmas */
                        ret = SWAP_MLOCK;  /* saw at least one mlocked vma */
                } else {
                        ret = try_to_unmap_one(page, vma, migration);
                        if (ret == SWAP_FAIL || !page_mapped(page))
                                break;
                }
                if (ret == SWAP_MLOCK) {
                        mlocked = try_to_mlock_page(page, vma);
                        if (mlocked)
                                break;  /* stop if actually mlocked page */
                }
        }

or that clause [under if (MLOCK_PAGES && unlikely(unlock))]
might be clearer as:

               if ((vma->vm_flags & VM_LOCKED) && page_mapped_in_vma(page, vma))
                      ret = SWAP_MLOCK;  /* saw at least one mlocked vma */
               else
                      continue;  /* must visit all unlocked vmas */


Do you agree?

And, I wonder if we need a similar check for 
page_mapped_in_vma(page, vma) up in try_to_unmap_one()?

> 
> > 
> > > 
> > > But, After I called munlockall intentionally, the counter work well. 
> > > In case of munlockall, we already had a mmap_sem about write. 
> > > Such a case, try_to_mlock_page can't call mlock_vma_page.
> > > so, mlocked counter didn't increased. 
> > > As a result, the counter seems to be work well but I think 
> > > it also have a problem.
> > 
> > I THINK this is a artifact of the way stats are accumulated in per cpu
> > differential counters and pushed to the zone state accumulators when a
> > threshold is reached.  I've seen this condition before, but it
> > eventually clears itself as the stats get pushed to the zone state.
> > Still, it bears more investigation, as it's been a while since I worked
> > on this and some subsequent fixes could have broken it:
> 
> Hmm... My test result is as follow. 
> 
> 1) without munlockall
> before:
> 
> root@barrios-target-linux:~# tail -8 /proc/vmstat 
> unevictable_pgs_culled 0
> unevictable_pgs_scanned 0
> unevictable_pgs_rescued 0
> unevictable_pgs_mlocked 0
> unevictable_pgs_munlocked 0
> unevictable_pgs_cleared 0
> unevictable_pgs_stranded 0
> unevictable_pgs_mlockfreed 0
> 
> root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
> Unevictable:           0 kB
> Mlocked:               0 kB
> 
> after:
> root@barrios-target-linux:~# tail -8 /proc/vmstat 
> unevictable_pgs_culled 369
> unevictable_pgs_scanned 0
> unevictable_pgs_rescued 388
> unevictable_pgs_mlocked 392
> unevictable_pgs_munlocked 387
> unevictable_pgs_cleared 1

this looks like either some task forked and COWed an anon page--perhaps
a stack page--or truncated a mlocked, mmaped file.  

> unevictable_pgs_stranded 0
> unevictable_pgs_mlockfreed 0
> 
> root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
> Unevictable:           8 kB
> Mlocked:               8 kB
> 
> after dropping cache
> 
> root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
> Unevictable:           4 kB
> Mlocked:               4 kB

Same effect I was seeing.  Two extra mlock counts until we drop cache.
Then only 1.  Interesting.

> 
> 
> 2) with munlockall 
> 
> barrios-target@barrios-target-linux:~$ tail -8 /proc/vmstat
> unevictable_pgs_culled 0
> unevictable_pgs_scanned 0
> unevictable_pgs_rescued 0
> unevictable_pgs_mlocked 0
> unevictable_pgs_munlocked 0
> unevictable_pgs_cleared 0
> unevictable_pgs_stranded 0
> unevictable_pgs_mlockfreed 0
> 
> barrios-target@barrios-target-linux:~$ cat /proc/meminfo | egrep 'Mlo|Unev'
> Unevictable:           0 kB
> Mlocked:               0 kB
> 
> after
> 
> 
> root@barrios-target-linux:~# tail -8 /proc/vmstat
> unevictable_pgs_culled 369
> unevictable_pgs_scanned 0
> unevictable_pgs_rescued 389
> unevictable_pgs_mlocked 389
> unevictable_pgs_munlocked 389
> unevictable_pgs_cleared 0
> unevictable_pgs_stranded 0
> unevictable_pgs_mlockfreed 0
> 
> root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
> Unevictable:           0 kB
> Mlocked:               0 kB
> 
> Both tests have to show same result. 
> But is didn't. 
> 
> I think it's not per-cpu problem. 
> 
> When I digged in the source, I found that. 
> In case of test without munlockall, try_to_unmap_file calls try_to_mlock_page 

This I don't understand.  exit_mmap() calls munlock_vma_pages_all() for
all VM_LOCKED vmas.  This should have the same effect as calling
mlock_fixup() without VM_LOCKED flags, which munlockall() does.


> since some pages are mapped several vmas(I don't know why that pages is shared 
> other vma in same process. 

Isn't necessarily in the same task.  We're traversing the list of vma's
associated with a single anon_vma.  This includes all ancestors and
descendants that haven't exec'd.  Of course, I don't see a fork() in
either of your test programs, so I don't know what's happening.

> One of page which i have seen is test program's 
> first code page[page->index : 0 vma->vm_pgoff : 0]. It was mapped by data vma, too). 
> Other vma have VM_LOCKED.
> So try_to_unmap_file calls try_to_mlock_page. Then, After calling 
> successful down_read_try_lock call, mlock_vma_page increased mlocked
> counter again. 
> 
> In case of test with munlockall, try_to_mlock_page's down_read_trylock 
> couldn't be sucessful. That's because munlockall called down_write.
> At result, try_to_mlock_page cannot call try_to_mlock_page. so, mlocked counter 
> don't increased. I think it's not right. 
> But fortunately Mlocked number is right. :(


I'll try with your 2nd test program [sent via separate mail] and try the
fix above.  I also want to understand the difference between exit_mmap()
for a task that called mlockall() and the munlockall() case.

Regards,
Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] mlocked page counter mismatch
  2009-01-29  1:48     ` Lee Schermerhorn
@ 2009-01-29  4:29       ` MinChan Kim
  2009-01-29 12:35       ` KOSAKI Motohiro
  1 sibling, 0 replies; 9+ messages in thread
From: MinChan Kim @ 2009-01-29  4:29 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: linux mm, linux kernel, Nick Piggin, KOSAKI Motohiro

Sorry for late response. 

> Looks at code again....
> 
> I think I see it.  In try_to_unmap_anon(), called from try_to_munlock(),
> we have:
> 
>          list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
>                 if (MLOCK_PAGES && unlikely(unlock)) {
>                         if (!((vma->vm_flags & VM_LOCKED) &&
> !!! should be '||' ?                                      ^^
>                               page_mapped_in_vma(page, vma)))
>                                 continue;  /* must visit all unlocked vmas */
>                         ret = SWAP_MLOCK;  /* saw at least one mlocked vma */
>                 } else {
>                         ret = try_to_unmap_one(page, vma, migration);
>                         if (ret == SWAP_FAIL || !page_mapped(page))
>                                 break;
>                 }
>                 if (ret == SWAP_MLOCK) {
>                         mlocked = try_to_mlock_page(page, vma);
>                         if (mlocked)
>                                 break;  /* stop if actually mlocked page */
>                 }
>         }
> 
> or that clause [under if (MLOCK_PAGES && unlikely(unlock))]
> might be clearer as:
> 
>                if ((vma->vm_flags & VM_LOCKED) && page_mapped_in_vma(page, vma))
>                       ret = SWAP_MLOCK;  /* saw at least one mlocked vma */
>                else
>                       continue;  /* must visit all unlocked vmas */
> 
> 
> Do you agree?

I agree this. 
This is more clear. we have to check another process's vma which is linked 
by anon_vma. 
We also have to check it in try_to_unmap_file. 


> 
> And, I wonder if we need a similar check for 
> page_mapped_in_vma(page, vma) up in try_to_unmap_one()?
> 
> > 
> > > 
> > > > 
> > > > But, After I called munlockall intentionally, the counter work well. 
> > > > In case of munlockall, we already had a mmap_sem about write. 
> > > > Such a case, try_to_mlock_page can't call mlock_vma_page.
> > > > so, mlocked counter didn't increased. 
> > > > As a result, the counter seems to be work well but I think 
> > > > it also have a problem.
> > > 
> > > I THINK this is a artifact of the way stats are accumulated in per cpu
> > > differential counters and pushed to the zone state accumulators when a
> > > threshold is reached.  I've seen this condition before, but it
> > > eventually clears itself as the stats get pushed to the zone state.
> > > Still, it bears more investigation, as it's been a while since I worked
> > > on this and some subsequent fixes could have broken it:
> > 
> > Hmm... My test result is as follow. 
> > 
> > 1) without munlockall
> > before:
> > 
> > root@barrios-target-linux:~# tail -8 /proc/vmstat 
> > unevictable_pgs_culled 0
> > unevictable_pgs_scanned 0
> > unevictable_pgs_rescued 0
> > unevictable_pgs_mlocked 0
> > unevictable_pgs_munlocked 0
> > unevictable_pgs_cleared 0
> > unevictable_pgs_stranded 0
> > unevictable_pgs_mlockfreed 0
> > 
> > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
> > Unevictable:           0 kB
> > Mlocked:               0 kB
> > 
> > after:
> > root@barrios-target-linux:~# tail -8 /proc/vmstat 
> > unevictable_pgs_culled 369
> > unevictable_pgs_scanned 0
> > unevictable_pgs_rescued 388
> > unevictable_pgs_mlocked 392
> > unevictable_pgs_munlocked 387
> > unevictable_pgs_cleared 1
> 
> this looks like either some task forked and COWed an anon page--perhaps
> a stack page--or truncated a mlocked, mmaped file.  
> 
> > unevictable_pgs_stranded 0
> > unevictable_pgs_mlockfreed 0
> > 
> > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
> > Unevictable:           8 kB
> > Mlocked:               8 kB
> > 
> > after dropping cache
> > 
> > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
> > Unevictable:           4 kB
> > Mlocked:               4 kB
> 
> Same effect I was seeing.  Two extra mlock counts until we drop cache.
> Then only 1.  Interesting.
> 
> > 
> > 
> > 2) with munlockall 
> > 
> > barrios-target@barrios-target-linux:~$ tail -8 /proc/vmstat
> > unevictable_pgs_culled 0
> > unevictable_pgs_scanned 0
> > unevictable_pgs_rescued 0
> > unevictable_pgs_mlocked 0
> > unevictable_pgs_munlocked 0
> > unevictable_pgs_cleared 0
> > unevictable_pgs_stranded 0
> > unevictable_pgs_mlockfreed 0
> > 
> > barrios-target@barrios-target-linux:~$ cat /proc/meminfo | egrep 'Mlo|Unev'
> > Unevictable:           0 kB
> > Mlocked:               0 kB
> > 
> > after
> > 
> > 
> > root@barrios-target-linux:~# tail -8 /proc/vmstat
> > unevictable_pgs_culled 369
> > unevictable_pgs_scanned 0
> > unevictable_pgs_rescued 389
> > unevictable_pgs_mlocked 389
> > unevictable_pgs_munlocked 389
> > unevictable_pgs_cleared 0
> > unevictable_pgs_stranded 0
> > unevictable_pgs_mlockfreed 0
> > 
> > root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
> > Unevictable:           0 kB
> > Mlocked:               0 kB
> > 
> > Both tests have to show same result. 
> > But is didn't. 
> > 
> > I think it's not per-cpu problem. 
> > 
> > When I digged in the source, I found that. 
> > In case of test without munlockall, try_to_unmap_file calls try_to_mlock_page 
> 
> This I don't understand.  exit_mmap() calls munlock_vma_pages_all() for
> all VM_LOCKED vmas.  This should have the same effect as calling
> mlock_fixup() without VM_LOCKED flags, which munlockall() does.

I said early. The difference is write of mmap_sem. 
In case of exit_mmap, it have readlock of mmap_sem. 
but In case of munlockall, it have writelock of mmap_sem. 
so try_to_mlock_page will fail down_read_trylock. 

> 
> 
> > since some pages are mapped several vmas(I don't know why that pages is shared 
> > other vma in same process. 
> 
> Isn't necessarily in the same task.  We're traversing the list of vma's
> associated with a single anon_vma.  This includes all ancestors and
> descendants that haven't exec'd.  Of course, I don't see a fork() in
> either of your test programs, so I don't know what's happening.

I agree. we have to traverse list of vma's. 
In my case, my test program's image's first page is mapped two vma.
one is code vma. the other is data vma. I don't know why code and data vmas
include program's first page.
 
> 
> > One of page which i have seen is test program's 
> > first code page[page->index : 0 vma->vm_pgoff : 0]. It was mapped by data vma, too). 
> > Other vma have VM_LOCKED.
> > So try_to_unmap_file calls try_to_mlock_page. Then, After calling 
> > successful down_read_try_lock call, mlock_vma_page increased mlocked
> > counter again. 
> > 
> > In case of test with munlockall, try_to_mlock_page's down_read_trylock 
> > couldn't be sucessful. That's because munlockall called down_write.
> > At result, try_to_mlock_page cannot call try_to_mlock_page. so, mlocked counter 
> > don't increased. I think it's not right. 
> > But fortunately Mlocked number is right. :(
> 
> 
> I'll try with your 2nd test program [sent via separate mail] and try the
> fix above.  I also want to understand the difference between exit_mmap()
> for a task that called mlockall() and the munlockall() case.
> 

Thanks for having an interest in this problem. :)

> Regards,
> Lee

-- 
Kinds Regards
MinChan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] mlocked page counter mismatch
  2009-01-29  1:48     ` Lee Schermerhorn
  2009-01-29  4:29       ` MinChan Kim
@ 2009-01-29 12:35       ` KOSAKI Motohiro
  2009-01-29 14:44         ` Lee Schermerhorn
  1 sibling, 1 reply; 9+ messages in thread
From: KOSAKI Motohiro @ 2009-01-29 12:35 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: MinChan Kim, linux mm, linux kernel, Nick Piggin

Hi

> I think I see it.  In try_to_unmap_anon(), called from try_to_munlock(),
> we have:
>
>         list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
>                if (MLOCK_PAGES && unlikely(unlock)) {
>                        if (!((vma->vm_flags & VM_LOCKED) &&
> !!! should be '||' ?                                      ^^
>                              page_mapped_in_vma(page, vma)))
>                                continue;  /* must visit all unlocked vmas */
>                        ret = SWAP_MLOCK;  /* saw at least one mlocked vma */
>                } else {
>                        ret = try_to_unmap_one(page, vma, migration);
>                        if (ret == SWAP_FAIL || !page_mapped(page))
>                                break;
>                }
>                if (ret == SWAP_MLOCK) {
>                        mlocked = try_to_mlock_page(page, vma);
>                        if (mlocked)
>                                break;  /* stop if actually mlocked page */
>                }
>        }
>
> or that clause [under if (MLOCK_PAGES && unlikely(unlock))]
> might be clearer as:
>
>               if ((vma->vm_flags & VM_LOCKED) && page_mapped_in_vma(page, vma))
>                      ret = SWAP_MLOCK;  /* saw at least one mlocked vma */
>               else
>                      continue;  /* must visit all unlocked vmas */
>
> Do you agree?

Hmmm.
I don't think so.

>                        if (!((vma->vm_flags & VM_LOCKED) &&
>                              page_mapped_in_vma(page, vma)))
>                                continue;  /* must visit all unlocked vmas */

is already equivalent to

>               if ((vma->vm_flags & VM_LOCKED) && page_mapped_in_vma(page, vma))
>                      ret = SWAP_MLOCK;  /* saw at least one mlocked vma */
>               else
>                      continue;  /* must visit all unlocked vmas */


> And, I wonder if we need a similar check for
> page_mapped_in_vma(page, vma) up in try_to_unmap_one()?

because page_mapped_in_vma() can return 0 if vma is anon vma only.

In the other word,
struct adress_space (for file) gurantee that unrelated vma doesn't chained.
but struct anon_vma (for anon) doesn't gurantee that unrelated vma
doesn't chained.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] mlocked page counter mismatch
  2009-01-29 12:35       ` KOSAKI Motohiro
@ 2009-01-29 14:44         ` Lee Schermerhorn
  0 siblings, 0 replies; 9+ messages in thread
From: Lee Schermerhorn @ 2009-01-29 14:44 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: MinChan Kim, linux mm, linux kernel, Nick Piggin

On Thu, 2009-01-29 at 21:35 +0900, KOSAKI Motohiro wrote:
> Hi
> 
> > I think I see it.  In try_to_unmap_anon(), called from try_to_munlock(),
> > we have:
> >
> >         list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
> >                if (MLOCK_PAGES && unlikely(unlock)) {
> >                        if (!((vma->vm_flags & VM_LOCKED) &&
> > !!! should be '||' ?                                      ^^
> >                              page_mapped_in_vma(page, vma)))
> >                                continue;  /* must visit all unlocked vmas */
> >                        ret = SWAP_MLOCK;  /* saw at least one mlocked vma */
> >                } else {
> >                        ret = try_to_unmap_one(page, vma, migration);
> >                        if (ret == SWAP_FAIL || !page_mapped(page))
> >                                break;
> >                }
> >                if (ret == SWAP_MLOCK) {
> >                        mlocked = try_to_mlock_page(page, vma);
> >                        if (mlocked)
> >                                break;  /* stop if actually mlocked page */
> >                }
> >        }
> >
> > or that clause [under if (MLOCK_PAGES && unlikely(unlock))]
> > might be clearer as:
> >
> >               if ((vma->vm_flags & VM_LOCKED) && page_mapped_in_vma(page, vma))
> >                      ret = SWAP_MLOCK;  /* saw at least one mlocked vma */
> >               else
> >                      continue;  /* must visit all unlocked vmas */
> >
> > Do you agree?
> 
> Hmmm.
> I don't think so.
> 
> >                        if (!((vma->vm_flags & VM_LOCKED) &&
> >                              page_mapped_in_vma(page, vma)))
> >                                continue;  /* must visit all unlocked vmas */
> 
> is already equivalent to
> 
> >               if ((vma->vm_flags & VM_LOCKED) && page_mapped_in_vma(page, vma))
> >                      ret = SWAP_MLOCK;  /* saw at least one mlocked vma */
> >               else
> >   
>                    continue;  /* must visit all unlocked vmas */

Hmmm, I should know not to try to read code when I'm that sleepy.  Had
myself convinced that the condition was wrong...

> 
> 
> > And, I wonder if we need a similar check for
> > page_mapped_in_vma(page, vma) up in try_to_unmap_one()?
> 
> because page_mapped_in_vma() can return 0 if vma is anon vma only.

by "vma is anon vma only", do you mean that the vma being tested--e.g.,
that we found to be VM_LOCKED--no longer has the page mapped in it's
page tables?  That is it's purpose--to detect this condition.  IIRC, Rik
added it during testing of the mlock patches when we discovered we were
mlocking pages because 

> 
> In the other word,
> struct adress_space (for file) gurantee that unrelated vma doesn't chained.

right.  that's why we don't have the page_mapped_in_vma() check in
try_to_unmap_file().

> but struct anon_vma (for anon) doesn't gurantee that unrelated vma
> doesn't chained.

Well, they're not exactly "unrelated"--vmas attached to an anon_vma are
from the same "family".  Any pages that haven't been COWed will still be
mapped into multiple mm's.

My question last night about try_to_unmap_one() wasn't really related to
the mlock statistics glitch.  Sorry I wasn't more clear about this.  I
was wondering about the case where shrink_page_list was trying to unmap
a page whose vma was on an anon_vma list with other VM_LOCKED vmas that
didn't actually map the page.  However, in the early morning light, I
see that the call to page_check_address() handles this.

----------
Anyway, our original responses to this report crossed in the mail.  You
said you'd handle it.  So, in the meantime, I'm looking at the
mmap()/vm_merge()/mlock_vma_pages_range() issue reported yesterday.

Regards,
Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-01-29 14:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-28 10:28 [BUG] mlocked page counter mismatch MinChan Kim
2009-01-28 14:38 ` KOSAKI Motohiro
2009-01-28 15:33 ` Lee Schermerhorn
2009-01-28 23:55   ` MinChan Kim
2009-01-28 23:57     ` MinChan Kim
2009-01-29  1:48     ` Lee Schermerhorn
2009-01-29  4:29       ` MinChan Kim
2009-01-29 12:35       ` KOSAKI Motohiro
2009-01-29 14:44         ` Lee Schermerhorn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox