* [PATCH v2] fix mlocked page counter mistmatch
@ 2009-02-03 4:24 MinChan Kim
2009-02-03 16:44 ` KOSAKI Motohiro
0 siblings, 1 reply; 13+ messages in thread
From: MinChan Kim @ 2009-02-03 4:24 UTC (permalink / raw)
To: Andrew Morton, linux mm
Cc: KOSAKI Motohiro, linux kernel, Nick Piggin, Rik van Riel
When I tested following program, I found that mlocked counter
is strange.
It couldn't free some mlocked pages of test program.
It is caused that try_to_unmap_file don't check real
page mapping in vmas.
That's because goal of address_space for file is to find all processes
into which the file's specific interval is mapped.
What I mean is that it's not related page but file's interval.
Even if the page isn't really mapping at the vma, it returns
SWAP_MLOCK since the vma have VM_LOCKED, then calls
try_to_mlock_page. After all, mlocked counter is increased again.
COWed anon page in a file-backed vma could be a such case.
This patch resolves it.
This patch is based on 2.6.28-rc2-mm1.
-- my test program --
int main()
{
mlockall(MCL_CURRENT);
return 0;
}
-- before --
root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
Unevictable: 0 kB
Mlocked: 0 kB
-- after --
root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
Unevictable: 8 kB
Mlocked: 8 kB
Signed-off-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
---
mm/rmap.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index 1099394..bd24b55 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1080,7 +1080,8 @@ static int try_to_unmap_file(struct page *page, int unlock, int migration)
spin_lock(&mapping->i_mmap_lock);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
if (MLOCK_PAGES && unlikely(unlock)) {
- if (!(vma->vm_flags & VM_LOCKED))
+ if (!((vma->vm_flags & VM_LOCKED) &&
+ page_mapped_in_vma(page, vma)))
continue; /* must visit all vmas */
ret = SWAP_MLOCK;
} else {
--
1.5.4.3
--
Kinds Regards
MinChan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] fix mlocked page counter mistmatch
2009-02-03 4:24 [PATCH v2] fix mlocked page counter mistmatch MinChan Kim
@ 2009-02-03 16:44 ` KOSAKI Motohiro
2009-02-03 23:44 ` MinChan Kim
0 siblings, 1 reply; 13+ messages in thread
From: KOSAKI Motohiro @ 2009-02-03 16:44 UTC (permalink / raw)
To: MinChan Kim
Cc: Andrew Morton, linux mm, linux kernel, Nick Piggin, Rik van Riel
Hi MinChan,
I'm confusing now.
Can you teach me?
> When I tested following program, I found that mlocked counter
> is strange.
> It couldn't free some mlocked pages of test program.
> It is caused that try_to_unmap_file don't check real
> page mapping in vmas.
What meanining is "real" page mapping?
> That's because goal of address_space for file is to find all processes
> into which the file's specific interval is mapped.
> What I mean is that it's not related page but file's interval.
hmmm. No.
I ran your reproduce program.
two vma pointing the same page cause this leaking.
iow, any library have .text and .data segment. then the tail of .text
and the head of .data vma point the same page.
its page was leaked.
> Even if the page isn't really mapping at the vma, it returns
> SWAP_MLOCK since the vma have VM_LOCKED, then calls
> try_to_mlock_page. After all, mlocked counter is increased again.
>
> COWed anon page in a file-backed vma could be a such case.
> This patch resolves it.
What meaning is "anon page in a file-backed"?
As far as I know, if cow happend on private mapping page, new page is
treated truth anon.
So, I don't reach to your conclusion yet. please teach me.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] fix mlocked page counter mistmatch
2009-02-03 16:44 ` KOSAKI Motohiro
@ 2009-02-03 23:44 ` MinChan Kim
2009-02-04 2:12 ` KOSAKI Motohiro
0 siblings, 1 reply; 13+ messages in thread
From: MinChan Kim @ 2009-02-03 23:44 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Andrew Morton, linux mm, linux kernel, Nick Piggin, Rik van Riel
On Wed, Feb 04, 2009 at 01:44:52AM +0900, KOSAKI Motohiro wrote:
> Hi MinChan,
>
> I'm confusing now.
> Can you teach me?
No problem. :)
>
> > When I tested following program, I found that mlocked counter
> > is strange.
> > It couldn't free some mlocked pages of test program.
> > It is caused that try_to_unmap_file don't check real
> > page mapping in vmas.
>
> What meanining is "real" page mapping?
What I mean is that if the page is mapped at the vma,
I call it's "real" page mapping.
I explain it more detaily below.
>
>
> > That's because goal of address_space for file is to find all processes
> > into which the file's specific interval is mapped.
> > What I mean is that it's not related page but file's interval.
>
> hmmm. No.
> I ran your reproduce program.
>
> two vma pointing the same page cause this leaking.
I don't think so.
>
> iow, any library have .text and .data segment. then the tail of .text
> and the head of .data vma point the same page.
> its page was leaked.
>
>
> > Even if the page isn't really mapping at the vma, it returns
> > SWAP_MLOCK since the vma have VM_LOCKED, then calls
> > try_to_mlock_page. After all, mlocked counter is increased again.
> >
> > COWed anon page in a file-backed vma could be a such case.
> > This patch resolves it.
>
> What meaning is "anon page in a file-backed"?
> As far as I know, if cow happend on private mapping page, new page is
> treated truth anon.
>
vm_area_struct's annotation can explain about your question.
struct vm_area_struct {
struct mm_struct * vm_mm; /* The address space we belong to. */
....
....
/*
* A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
* list, after a COW of one of the file pages. A MAP_SHARED vma
* can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack
* or brk vma (with NULL file) can only be in an anon_vma list.
*/
struct list_head anon_vma_node; /* Serialized by anon_vma->lock */
struct anon_vma *anon_vma; /* Serialized by page_table_lock */
....
....
}
Let us call it anon page in a file-backed.
In this case, the new page is mapped at the vma.
the vma don't include old page any more but i_mmap tree still have
the vma.
So, the i_mmap tree can have the vma which don't include
the page if the one is anon page in a file-backed.
This problem is caused by that.
Is it enough ?
>
> So, I don't reach to your conclusion yet. please teach me.
--
Kinds Regards
MinChan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] fix mlocked page counter mistmatch
2009-02-03 23:44 ` MinChan Kim
@ 2009-02-04 2:12 ` KOSAKI Motohiro
2009-02-04 2:44 ` MinChan Kim
0 siblings, 1 reply; 13+ messages in thread
From: KOSAKI Motohiro @ 2009-02-04 2:12 UTC (permalink / raw)
To: MinChan Kim
Cc: kosaki.motohiro, Andrew Morton, linux mm, linux kernel,
Nick Piggin, Rik van Riel
> On Wed, Feb 04, 2009 at 01:44:52AM +0900, KOSAKI Motohiro wrote:
> > Hi MinChan,
> >
> > I'm confusing now.
> > Can you teach me?
>
> No problem. :)
>
> >
> > > When I tested following program, I found that mlocked counter
> > > is strange.
> > > It couldn't free some mlocked pages of test program.
> > > It is caused that try_to_unmap_file don't check real
> > > page mapping in vmas.
> >
> > What meanining is "real" page mapping?
>
> What I mean is that if the page is mapped at the vma,
> I call it's "real" page mapping.
> I explain it more detaily below.
>
> >
> >
> > > That's because goal of address_space for file is to find all processes
> > > into which the file's specific interval is mapped.
> > > What I mean is that it's not related page but file's interval.
> >
> > hmmm. No.
> > I ran your reproduce program.
> >
> > two vma pointing the same page cause this leaking.
>
> I don't think so.
Please confirm by actual machine and kernel.
I confirmed by printk debugging.
> > iow, any library have .text and .data segment. then the tail of .text
> > and the head of .data vma point the same page.
> > its page was leaked.
> >
> >
> > > Even if the page isn't really mapping at the vma, it returns
> > > SWAP_MLOCK since the vma have VM_LOCKED, then calls
> > > try_to_mlock_page. After all, mlocked counter is increased again.
> > >
> > > COWed anon page in a file-backed vma could be a such case.
> > > This patch resolves it.
> >
> > What meaning is "anon page in a file-backed"?
> > As far as I know, if cow happend on private mapping page, new page is
> > treated truth anon.
> >
>
> vm_area_struct's annotation can explain about your question.
>
> struct vm_area_struct {
> struct mm_struct * vm_mm; /* The address space we belong to. */
> ....
> ....
> /*
> * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
> * list, after a COW of one of the file pages. A MAP_SHARED vma
> * can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack
> * or brk vma (with NULL file) can only be in an anon_vma list.
> */
> struct list_head anon_vma_node; /* Serialized by anon_vma->lock */
> struct anon_vma *anon_vma; /* Serialized by page_table_lock */
> ....
> ....
> }
>
> Let us call it anon page in a file-backed.
> In this case, the new page is mapped at the vma.
> the vma don't include old page any more but i_mmap tree still have
> the vma.
hmhm. thanks.
my understanding largely improvement.
I agree page_check_address() checking is necessary.
> So, the i_mmap tree can have the vma which don't include
> the page if the one is anon page in a file-backed.
>
> This problem is caused by that.
> Is it enough ?
Could you please teach me why this issue doesn't happend on munlockall()?
your scenario seems to don't depend on exit_mmap().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] fix mlocked page counter mistmatch
2009-02-04 2:12 ` KOSAKI Motohiro
@ 2009-02-04 2:44 ` MinChan Kim
2009-02-04 2:51 ` KOSAKI Motohiro
0 siblings, 1 reply; 13+ messages in thread
From: MinChan Kim @ 2009-02-04 2:44 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Andrew Morton, linux mm, linux kernel, Nick Piggin, Rik van Riel
On Wed, Feb 04, 2009 at 11:12:37AM +0900, KOSAKI Motohiro wrote:
> > On Wed, Feb 04, 2009 at 01:44:52AM +0900, KOSAKI Motohiro wrote:
> > > Hi MinChan,
> > >
> > > I'm confusing now.
> > > Can you teach me?
> >
> > No problem. :)
> >
> > >
> > > > When I tested following program, I found that mlocked counter
> > > > is strange.
> > > > It couldn't free some mlocked pages of test program.
> > > > It is caused that try_to_unmap_file don't check real
> > > > page mapping in vmas.
> > >
> > > What meanining is "real" page mapping?
> >
> > What I mean is that if the page is mapped at the vma,
> > I call it's "real" page mapping.
> > I explain it more detaily below.
> >
> > >
> > >
> > > > That's because goal of address_space for file is to find all processes
> > > > into which the file's specific interval is mapped.
> > > > What I mean is that it's not related page but file's interval.
> > >
> > > hmmm. No.
> > > I ran your reproduce program.
> > >
> > > two vma pointing the same page cause this leaking.
> >
> > I don't think so.
>
> Please confirm by actual machine and kernel.
> I confirmed by printk debugging.
>
It seems that we have a misundersting.
I think you can't understand my point. Sorry for my poor english.
You're right and i also already tested it, of course.
two vmas point to same address but have a different page due to COW.
So, What I mean is that problem is lack of page_check_address.
It causes this problem. :)
>
> > > iow, any library have .text and .data segment. then the tail of .text
> > > and the head of .data vma point the same page.
> > > its page was leaked.
> > >
> > >
> > > > Even if the page isn't really mapping at the vma, it returns
> > > > SWAP_MLOCK since the vma have VM_LOCKED, then calls
> > > > try_to_mlock_page. After all, mlocked counter is increased again.
> > > >
> > > > COWed anon page in a file-backed vma could be a such case.
> > > > This patch resolves it.
> > >
> > > What meaning is "anon page in a file-backed"?
> > > As far as I know, if cow happend on private mapping page, new page is
> > > treated truth anon.
> > >
> >
> > vm_area_struct's annotation can explain about your question.
> >
> > struct vm_area_struct {
> > struct mm_struct * vm_mm; /* The address space we belong to. */
> > ....
> > ....
> > /*
> > * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
> > * list, after a COW of one of the file pages. A MAP_SHARED vma
> > * can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack
> > * or brk vma (with NULL file) can only be in an anon_vma list.
> > */
> > struct list_head anon_vma_node; /* Serialized by anon_vma->lock */
> > struct anon_vma *anon_vma; /* Serialized by page_table_lock */
> > ....
> > ....
> > }
> >
> > Let us call it anon page in a file-backed.
> > In this case, the new page is mapped at the vma.
> > the vma don't include old page any more but i_mmap tree still have
> > the vma.
>
> hmhm. thanks.
> my understanding largely improvement.
>
> I agree page_check_address() checking is necessary.
>
> > So, the i_mmap tree can have the vma which don't include
> > the page if the one is anon page in a file-backed.
> >
> > This problem is caused by that.
> > Is it enough ?
>
> Could you please teach me why this issue doesn't happend on munlockall()?
> your scenario seems to don't depend on exit_mmap().
Good question.
It's a different issue.
It is related to mmap_sem locking issue.
Actually, I am about to make a patch.
But, I can't understand that Why try_do_mlock_page should downgrade mm_sem ?
Is it necessary ?
In munlockall path, mmap_sem already is holding in write-mode of mmap_sem.
so, try_to_mlock_page always fail to downgrade mmap_sem.
It's why it looks like working well about mlocked counter.
>
>
--
Kinds Regards
MinChan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] fix mlocked page counter mistmatch
2009-02-04 2:44 ` MinChan Kim
@ 2009-02-04 2:51 ` KOSAKI Motohiro
2009-02-04 4:57 ` MinChan Kim
0 siblings, 1 reply; 13+ messages in thread
From: KOSAKI Motohiro @ 2009-02-04 2:51 UTC (permalink / raw)
To: MinChan Kim
Cc: kosaki.motohiro, Andrew Morton, linux mm, linux kernel,
Nick Piggin, Rik van Riel
> > Could you please teach me why this issue doesn't happend on munlockall()?
> > your scenario seems to don't depend on exit_mmap().
>
>
> Good question.
> It's a different issue.
> It is related to mmap_sem locking issue.
>
> Actually, I am about to make a patch.
> But, I can't understand that Why try_do_mlock_page should downgrade mm_sem ?
> Is it necessary ?
>
> In munlockall path, mmap_sem already is holding in write-mode of mmap_sem.
> so, try_to_mlock_page always fail to downgrade mmap_sem.
> It's why it looks like working well about mlocked counter.
lastest linus tree don't have downgrade mmap_sem.
(recently it was removed)
please see it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] fix mlocked page counter mistmatch
2009-02-04 2:51 ` KOSAKI Motohiro
@ 2009-02-04 4:57 ` MinChan Kim
2009-02-04 10:28 ` KOSAKI Motohiro
0 siblings, 1 reply; 13+ messages in thread
From: MinChan Kim @ 2009-02-04 4:57 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Andrew Morton, linux mm, linux kernel, Nick Piggin, Rik van Riel
On Wed, Feb 04, 2009 at 11:51:43AM +0900, KOSAKI Motohiro wrote:
> > > Could you please teach me why this issue doesn't happend on munlockall()?
> > > your scenario seems to don't depend on exit_mmap().
> >
> >
> > Good question.
> > It's a different issue.
> > It is related to mmap_sem locking issue.
> >
> > Actually, I am about to make a patch.
> > But, I can't understand that Why try_do_mlock_page should downgrade mm_sem ?
> > Is it necessary ?
> >
> > In munlockall path, mmap_sem already is holding in write-mode of mmap_sem.
> > so, try_to_mlock_page always fail to downgrade mmap_sem.
> > It's why it looks like working well about mlocked counter.
>
> lastest linus tree don't have downgrade mmap_sem.
> (recently it was removed)
Thnaks for information.
what is 'latest linus tree' ?
You mean '29-rc3-git5'?
With '29-rc3-git5', I found,
static int try_to_mlock_page(struct page *page, struct vm_area_struct *vma)
{
int mlocked = 0;
if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
if (vma->vm_flags & VM_LOCKED) {
mlock_vma_page(page);
mlocked++; /* really mlocked the page */
}
up_read(&vma->vm_mm->mmap_sem);
}
return mlocked;
}
It still try to downgrade mmap_sem.
Do I miss something ?
>
> please see it.
>
--
Kinds Regards
MinChan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] fix mlocked page counter mistmatch
2009-02-04 4:57 ` MinChan Kim
@ 2009-02-04 10:28 ` KOSAKI Motohiro
2009-02-04 14:07 ` Lee Schermerhorn
2009-02-04 23:35 ` MinChan Kim
0 siblings, 2 replies; 13+ messages in thread
From: KOSAKI Motohiro @ 2009-02-04 10:28 UTC (permalink / raw)
To: MinChan Kim
Cc: kosaki.motohiro, Andrew Morton, linux mm, linux kernel,
Nick Piggin, Rik van Riel
> With '29-rc3-git5', I found,
>
> static int try_to_mlock_page(struct page *page, struct vm_area_struct *vma)
> {
> int mlocked = 0;
>
> if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
> if (vma->vm_flags & VM_LOCKED) {
> mlock_vma_page(page);
> mlocked++; /* really mlocked the page */
> }
> up_read(&vma->vm_mm->mmap_sem);
> }
> return mlocked;
> }
>
> It still try to downgrade mmap_sem.
> Do I miss something ?
sorry, I misunderstood your "downgrade". I said linus removed downgrade_write(&mma_sem).
Now, I understand this issue perfectly. I agree you and lee-san's fix is correct.
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
and, I think current try_to_mlock_page() is correct. no need change.
Why?
1. Generally, mmap_sem holding is necessary when vma->vm_flags accessed.
that's vma's basic rule.
2. However, try_to_unmap_one() doesn't held mamp_sem. but that's ok.
it often get incorrect result. but caller consider incorrect value safe.
3. try_to_mlock_page() need mmap_sem because it obey rule (1).
4. in try_to_mlock_page(), if down_read_trylock() is failure,
we can't move the page to unevictable list. but that's ok.
the page in evictable list is periodically try to reclaim. and
be called try_to_unmap().
try_to_unmap() (and its caller) also move the unevictable page to unevictable list.
Therefore, in long term view, the page leak is not happend.
this explanation is enough?
thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] fix mlocked page counter mistmatch
2009-02-04 10:28 ` KOSAKI Motohiro
@ 2009-02-04 14:07 ` Lee Schermerhorn
2009-02-04 23:38 ` MinChan Kim
2009-02-04 23:35 ` MinChan Kim
1 sibling, 1 reply; 13+ messages in thread
From: Lee Schermerhorn @ 2009-02-04 14:07 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: MinChan Kim, Andrew Morton, linux mm, linux kernel, Nick Piggin,
Rik van Riel
On Wed, 2009-02-04 at 19:28 +0900, KOSAKI Motohiro wrote:
> > With '29-rc3-git5', I found,
> >
> > static int try_to_mlock_page(struct page *page, struct vm_area_struct *vma)
> > {
> > int mlocked = 0;
> >
> > if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
> > if (vma->vm_flags & VM_LOCKED) {
> > mlock_vma_page(page);
> > mlocked++; /* really mlocked the page */
> > }
> > up_read(&vma->vm_mm->mmap_sem);
> > }
> > return mlocked;
> > }
> >
> > It still try to downgrade mmap_sem.
> > Do I miss something ?
>
> sorry, I misunderstood your "downgrade". I said linus removed downgrade_write(&mma_sem).
>
> Now, I understand this issue perfectly. I agree you and lee-san's fix is correct.
> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
>
> and, I think current try_to_mlock_page() is correct. no need change.
> Why?
>
> 1. Generally, mmap_sem holding is necessary when vma->vm_flags accessed.
> that's vma's basic rule.
> 2. However, try_to_unmap_one() doesn't held mamp_sem. but that's ok.
> it often get incorrect result. but caller consider incorrect value safe.
> 3. try_to_mlock_page() need mmap_sem because it obey rule (1).
> 4. in try_to_mlock_page(), if down_read_trylock() is failure,
> we can't move the page to unevictable list. but that's ok.
> the page in evictable list is periodically try to reclaim. and
> be called try_to_unmap().
> try_to_unmap() (and its caller) also move the unevictable page to unevictable list.
> Therefore, in long term view, the page leak is not happend.
>
Also worth noting that down_read_trylock() does not "downgrade" the
semaphore. It only tries to acquire it in read mode.
As Kosaki-san says, try_to_unmap() doesn't normally hold the mmap_sem.
It needs to acquire it here to stabilize the vma [vm_flags] while
mlocking the pages. This is the place where a page mapped in a
VM_LOCKED vma that vmscan found on the normal lru list--e.g., because we
couldn't isolate them in mlock_vma_page()--get marked mlocked, if not
already marked. mlock_vma_page() is a no-op if page is already mlocked.
If we successsfully acquire the mmap_sem and the vma is still VM_LOCKED,
we know that the page is mlocked and try_to_unmap() will return
SWAP_MLOCK. This allows vmscan [shrink_page_list()] to move the page to
the unevictable list and not need to bother with it in subsequent scans
until it becomes munlocked.
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] fix mlocked page counter mistmatch
2009-02-04 10:28 ` KOSAKI Motohiro
2009-02-04 14:07 ` Lee Schermerhorn
@ 2009-02-04 23:35 ` MinChan Kim
2009-02-05 2:17 ` KOSAKI Motohiro
1 sibling, 1 reply; 13+ messages in thread
From: MinChan Kim @ 2009-02-04 23:35 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Andrew Morton, linux mm, linux kernel, Nick Piggin, Rik van Riel
On Wed, Feb 04, 2009 at 07:28:19PM +0900, KOSAKI Motohiro wrote:
> > With '29-rc3-git5', I found,
> >
> > static int try_to_mlock_page(struct page *page, struct vm_area_struct *vma)
> > {
> > int mlocked = 0;
> >
> > if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
> > if (vma->vm_flags & VM_LOCKED) {
> > mlock_vma_page(page);
> > mlocked++; /* really mlocked the page */
> > }
> > up_read(&vma->vm_mm->mmap_sem);
> > }
> > return mlocked;
> > }
> >
> > It still try to downgrade mmap_sem.
> > Do I miss something ?
>
> sorry, I misunderstood your "downgrade". I said linus removed downgrade_write(&mma_sem).
>
> Now, I understand this issue perfectly. I agree you and lee-san's fix is correct.
> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Good. I will send adrew with your ACK agian.
>
>
> and, I think current try_to_mlock_page() is correct. no need change.
> Why?
>
> 1. Generally, mmap_sem holding is necessary when vma->vm_flags accessed.
> that's vma's basic rule.
> 2. However, try_to_unmap_one() doesn't held mamp_sem. but that's ok.
> it often get incorrect result. but caller consider incorrect value safe.
> 3. try_to_mlock_page() need mmap_sem because it obey rule (1).
> 4. in try_to_mlock_page(), if down_read_trylock() is failure,
> we can't move the page to unevictable list. but that's ok.
> the page in evictable list is periodically try to reclaim. and
> be called try_to_unmap().
> try_to_unmap() (and its caller) also move the unevictable page to unevictable list.
> Therefore, in long term view, the page leak is not happend.
Thanks for clarification.
In long term view, you're right.
but My concern is that munlock[all] pathes always hold down of mmap_sem.
After all, down_read_trylock always wil fail for such cases.
So, current task's mlocked pages only can be reclaimed
by background or direct reclaim path if the task don't exit.
I think it can increase reclaim overhead unnecessary
if there are lots of such tasks.
What's your opinion ?
>
> this explanation is enough?
>
> thanks.
>
--
Kinds Regards
MinChan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] fix mlocked page counter mistmatch
2009-02-04 14:07 ` Lee Schermerhorn
@ 2009-02-04 23:38 ` MinChan Kim
0 siblings, 0 replies; 13+ messages in thread
From: MinChan Kim @ 2009-02-04 23:38 UTC (permalink / raw)
To: Lee Schermerhorn
Cc: KOSAKI Motohiro, Andrew Morton, linux mm, linux kernel,
Nick Piggin, Rik van Riel
On Wed, Feb 04, 2009 at 09:07:16AM -0500, Lee Schermerhorn wrote:
> On Wed, 2009-02-04 at 19:28 +0900, KOSAKI Motohiro wrote:
> > > With '29-rc3-git5', I found,
> > >
> > > static int try_to_mlock_page(struct page *page, struct vm_area_struct *vma)
> > > {
> > > int mlocked = 0;
> > >
> > > if (down_read_trylock(&vma->vm_mm->mmap_sem)) {
> > > if (vma->vm_flags & VM_LOCKED) {
> > > mlock_vma_page(page);
> > > mlocked++; /* really mlocked the page */
> > > }
> > > up_read(&vma->vm_mm->mmap_sem);
> > > }
> > > return mlocked;
> > > }
> > >
> > > It still try to downgrade mmap_sem.
> > > Do I miss something ?
> >
> > sorry, I misunderstood your "downgrade". I said linus removed downgrade_write(&mma_sem).
> >
> > Now, I understand this issue perfectly. I agree you and lee-san's fix is correct.
> > Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> >
> >
> > and, I think current try_to_mlock_page() is correct. no need change.
> > Why?
> >
> > 1. Generally, mmap_sem holding is necessary when vma->vm_flags accessed.
> > that's vma's basic rule.
> > 2. However, try_to_unmap_one() doesn't held mamp_sem. but that's ok.
> > it often get incorrect result. but caller consider incorrect value safe.
> > 3. try_to_mlock_page() need mmap_sem because it obey rule (1).
> > 4. in try_to_mlock_page(), if down_read_trylock() is failure,
> > we can't move the page to unevictable list. but that's ok.
> > the page in evictable list is periodically try to reclaim. and
> > be called try_to_unmap().
> > try_to_unmap() (and its caller) also move the unevictable page to unevictable list.
> > Therefore, in long term view, the page leak is not happend.
> >
>
> Also worth noting that down_read_trylock() does not "downgrade" the
> semaphore. It only tries to acquire it in read mode.
I selected wrong word. :(
>
> As Kosaki-san says, try_to_unmap() doesn't normally hold the mmap_sem.
> It needs to acquire it here to stabilize the vma [vm_flags] while
> mlocking the pages. This is the place where a page mapped in a
> VM_LOCKED vma that vmscan found on the normal lru list--e.g., because we
> couldn't isolate them in mlock_vma_page()--get marked mlocked, if not
> already marked. mlock_vma_page() is a no-op if page is already mlocked.
>
> If we successsfully acquire the mmap_sem and the vma is still VM_LOCKED,
> we know that the page is mlocked and try_to_unmap() will return
> SWAP_MLOCK. This allows vmscan [shrink_page_list()] to move the page to
> the unevictable list and not need to bother with it in subsequent scans
> until it becomes munlocked.
>
I answered my concern in Kosaki-san's reply.
> Lee
>
>
--
Kinds Regards
MinChan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] fix mlocked page counter mistmatch
2009-02-04 23:35 ` MinChan Kim
@ 2009-02-05 2:17 ` KOSAKI Motohiro
2009-02-05 2:32 ` MinChan Kim
0 siblings, 1 reply; 13+ messages in thread
From: KOSAKI Motohiro @ 2009-02-05 2:17 UTC (permalink / raw)
To: MinChan Kim
Cc: kosaki.motohiro, Andrew Morton, linux mm, linux kernel,
Nick Piggin, Rik van Riel, Lee Schermerhorn
> > and, I think current try_to_mlock_page() is correct. no need change.
> > Why?
> >
> > 1. Generally, mmap_sem holding is necessary when vma->vm_flags accessed.
> > that's vma's basic rule.
> > 2. However, try_to_unmap_one() doesn't held mamp_sem. but that's ok.
> > it often get incorrect result. but caller consider incorrect value safe.
> > 3. try_to_mlock_page() need mmap_sem because it obey rule (1).
> > 4. in try_to_mlock_page(), if down_read_trylock() is failure,
> > we can't move the page to unevictable list. but that's ok.
> > the page in evictable list is periodically try to reclaim. and
> > be called try_to_unmap().
> > try_to_unmap() (and its caller) also move the unevictable page to unevictable list.
> > Therefore, in long term view, the page leak is not happend.
>
> Thanks for clarification.
> In long term view, you're right.
>
> but My concern is that munlock[all] pathes always hold down of mmap_sem.
> After all, down_read_trylock always wil fail for such cases.
>
> So, current task's mlocked pages only can be reclaimed
> by background or direct reclaim path if the task don't exit.
>
> I think it can increase reclaim overhead unnecessary
> if there are lots of such tasks.
>
> What's your opinion ?
I have 2 comment.
1. typical application never munlock()ed at all.
and exit() path is already efficient.
then, I don't like hacky apploach.
2. I think we should drop mmap_sem holding in munlock path in the future.
at that time, this issue disappear automatically.
it's clean way more.
What do you think it?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2] fix mlocked page counter mistmatch
2009-02-05 2:17 ` KOSAKI Motohiro
@ 2009-02-05 2:32 ` MinChan Kim
0 siblings, 0 replies; 13+ messages in thread
From: MinChan Kim @ 2009-02-05 2:32 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: MinChan Kim, Andrew Morton, linux mm, linux kernel, Nick Piggin,
Rik van Riel, Lee Schermerhorn
On Thu, Feb 5, 2009 at 11:17 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> > and, I think current try_to_mlock_page() is correct. no need change.
>> > Why?
>> >
>> > 1. Generally, mmap_sem holding is necessary when vma->vm_flags accessed.
>> > that's vma's basic rule.
>> > 2. However, try_to_unmap_one() doesn't held mamp_sem. but that's ok.
>> > it often get incorrect result. but caller consider incorrect value safe.
>> > 3. try_to_mlock_page() need mmap_sem because it obey rule (1).
>> > 4. in try_to_mlock_page(), if down_read_trylock() is failure,
>> > we can't move the page to unevictable list. but that's ok.
>> > the page in evictable list is periodically try to reclaim. and
>> > be called try_to_unmap().
>> > try_to_unmap() (and its caller) also move the unevictable page to unevictable list.
>> > Therefore, in long term view, the page leak is not happend.
>>
>> Thanks for clarification.
>> In long term view, you're right.
>>
>> but My concern is that munlock[all] pathes always hold down of mmap_sem.
>> After all, down_read_trylock always wil fail for such cases.
>>
>> So, current task's mlocked pages only can be reclaimed
>> by background or direct reclaim path if the task don't exit.
>>
>> I think it can increase reclaim overhead unnecessary
>> if there are lots of such tasks.
>>
>> What's your opinion ?
>
> I have 2 comment.
>
> 1. typical application never munlock()ed at all.
Sometime application of embedded can do it.
That's becuase they want deterministic page allocation in some situation.
However, It's not a matter in here.
> and exit() path is already efficient.
> then, I don't like hacky apploach.
> 2. I think we should drop mmap_sem holding in munlock path in the future.
> at that time, this issue disappear automatically.
> it's clean way more.
If we can drop mmap_sem in munlock path, I am happy, too.
Please, CCed me if you make a patch for it.
By that time, I will fold this issue. :)
>
> What do you think it?
>
>
>
--
Thanks,
MinChan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-02-05 2:32 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-03 4:24 [PATCH v2] fix mlocked page counter mistmatch MinChan Kim
2009-02-03 16:44 ` KOSAKI Motohiro
2009-02-03 23:44 ` MinChan Kim
2009-02-04 2:12 ` KOSAKI Motohiro
2009-02-04 2:44 ` MinChan Kim
2009-02-04 2:51 ` KOSAKI Motohiro
2009-02-04 4:57 ` MinChan Kim
2009-02-04 10:28 ` KOSAKI Motohiro
2009-02-04 14:07 ` Lee Schermerhorn
2009-02-04 23:38 ` MinChan Kim
2009-02-04 23:35 ` MinChan Kim
2009-02-05 2:17 ` KOSAKI Motohiro
2009-02-05 2:32 ` MinChan Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox