linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Race condition between putback_lru_page and mem_cgroup_move_list
@ 2008-08-04 14:36 MinChan Kim
  2008-08-04 15:31 ` Lee Schermerhorn
  0 siblings, 1 reply; 14+ messages in thread
From: MinChan Kim @ 2008-08-04 14:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki, Balbir Singh
  Cc: linux-mm, Rik van Riel, KOSAKI Motohiro, Lee Schermerhorn, LKML

I think this is a race condition if mem_cgroup_move_lists's comment isn't right.
I am not sure that it was already known problem.

mem_cgroup_move_lists assume the appropriate zone's lru lock is already held.
but putback_lru_page calls mem_cgroup_move_lists without holding lru_lock.

Repeatedly, spin_[un/lock]_irq use in mem_cgroup_move_list have a big overhead
while doing list iteration.

Do we have to use pagevec ?

-- 
Kinds regards,
MinChan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Race condition between putback_lru_page and mem_cgroup_move_list
  2008-08-04 14:36 Race condition between putback_lru_page and mem_cgroup_move_list MinChan Kim
@ 2008-08-04 15:31 ` Lee Schermerhorn
  2008-08-04 16:37   ` KOSAKI Motohiro
  0 siblings, 1 reply; 14+ messages in thread
From: Lee Schermerhorn @ 2008-08-04 15:31 UTC (permalink / raw)
  To: MinChan Kim
  Cc: KAMEZAWA Hiroyuki, Balbir Singh, linux-mm, Rik van Riel,
	KOSAKI Motohiro, LKML

On Mon, 2008-08-04 at 23:36 +0900, MinChan Kim wrote:
> I think this is a race condition if mem_cgroup_move_lists's comment isn't right.
> I am not sure that it was already known problem.
> 
> mem_cgroup_move_lists assume the appropriate zone's lru lock is already held.
> but putback_lru_page calls mem_cgroup_move_lists without holding lru_lock.
> 

Hmmm, the comment on mem_cgroup_move_lists() does say this.  Although,
reading thru' the code, I can't see why it requires this.  But then it's
Monday, here...


> Repeatedly, spin_[un/lock]_irq use in mem_cgroup_move_list have a big overhead
> while doing list iteration.
> 
> Do we have to use pagevec ?

This shouldn't be necessary, IMO.  putback_lru_page() is used as
follows:

1) in vmscan.c [shrink_*_list()] when an unevictable page is
encountered.  This should be relatively rare.  Once vmscan sees an
unevictable page, it parks it on the unevictable lru list where it
[vmscan] won't see the page again until it becomes reclaimable.

2) as a replacement for move_to_lru() in page migration as the inverse
to isolate_lru_page().  We did this to catch patches that became
unevictable or, more importantly, evictable while page migration held
them isolated.  move_to_lru() already grabbed and released the zone lru
lock on each page migrated.

3) In m[un]lock_vma_page() and clear_page_mlock(), new with in the
"mlocked pages are unevictable" series.  This one can result in a storm
of zone lru traffic--e.g., mlock()ing or munlocking() a large segment or
mlockall() of a task with a lot of mapped address space.  Again, this is
probably a very rare event--unless you're stressing [stressing over?]
mlock(), as I've been doing :)--and often involves a major fault [page
allocation], per page anyway.

Ii>>? originally did have a pagevec for the unevictable lru but it
complicated ensuring that we don't strand evictable pages on the
unevictable list.  See the retry logic in putback_lru_page().

As for the !UNEVICTABLE_LRU version, the only place this should be
called is from page migration as none of the other call sites are
compiled in or reachable when !UNEVICTABLE_LRU.

Thoughts?

Lee
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Race condition between putback_lru_page and mem_cgroup_move_list
  2008-08-04 15:31 ` Lee Schermerhorn
@ 2008-08-04 16:37   ` KOSAKI Motohiro
  2008-08-04 17:52     ` Balbir Singh
  2008-08-05  3:49     ` kamezawa.hiroyu
  0 siblings, 2 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2008-08-04 16:37 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: MinChan Kim, KAMEZAWA Hiroyuki, Balbir Singh, linux-mm,
	Rik van Riel, LKML

Hi

>> I think this is a race condition if mem_cgroup_move_lists's comment isn't right.
>> I am not sure that it was already known problem.
>>
>> mem_cgroup_move_lists assume the appropriate zone's lru lock is already held.
>> but putback_lru_page calls mem_cgroup_move_lists without holding lru_lock.
>
> Hmmm, the comment on mem_cgroup_move_lists() does say this.  Although,
> reading thru' the code, I can't see why it requires this.  But then it's
> Monday, here...

I also think zone's lru lock is unnecessary.
So, I guess below "it" indicate lock_page_cgroup, not zone lru lock.

 >> But we cannot safely get to page_cgroup without it, so just try_lock it:

if my assumption is true, comment modifying is better.


>> Repeatedly, spin_[un/lock]_irq use in mem_cgroup_move_list have a big overhead
>> while doing list iteration.
>>
>> Do we have to use pagevec ?
>
> This shouldn't be necessary, IMO.  putback_lru_page() is used as
> follows:
>
> 1) in vmscan.c [shrink_*_list()] when an unevictable page is
> encountered.  This should be relatively rare.  Once vmscan sees an
> unevictable page, it parks it on the unevictable lru list where it
> [vmscan] won't see the page again until it becomes reclaimable.
>
> 2) as a replacement for move_to_lru() in page migration as the inverse
> to isolate_lru_page().  We did this to catch patches that became
> unevictable or, more importantly, evictable while page migration held
> them isolated.  move_to_lru() already grabbed and released the zone lru
> lock on each page migrated.
>
> 3) In m[un]lock_vma_page() and clear_page_mlock(), new with in the
> "mlocked pages are unevictable" series.  This one can result in a storm
> of zone lru traffic--e.g., mlock()ing or munlocking() a large segment or
> mlockall() of a task with a lot of mapped address space.  Again, this is
> probably a very rare event--unless you're stressing [stressing over?]
> mlock(), as I've been doing :)--and often involves a major fault [page
> allocation], per page anyway.
>
> I originally did have a pagevec for the unevictable lru but it
> complicated ensuring that we don't strand evictable pages on the
> unevictable list.  See the retry logic in putback_lru_page().
>
> As for the !UNEVICTABLE_LRU version, the only place this should be
> called is from page migration as none of the other call sites are
> compiled in or reachable when !UNEVICTABLE_LRU.
>
> Thoughts?

I think both opinion is correct.
unevictable lru related code doesn't require pagevec.

but mem_cgroup_move_lists is used by active/inactive list transition too.
then, pagevec is necessary for keeping reclaim throuput.

Kim-san, Thank you nice point out!
I queued this fix to my TODO list.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Race condition between putback_lru_page and mem_cgroup_move_list
  2008-08-04 16:37   ` KOSAKI Motohiro
@ 2008-08-04 17:52     ` Balbir Singh
  2008-08-04 23:52       ` MinChan Kim
                         ` (2 more replies)
  2008-08-05  3:49     ` kamezawa.hiroyu
  1 sibling, 3 replies; 14+ messages in thread
From: Balbir Singh @ 2008-08-04 17:52 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Lee Schermerhorn, MinChan Kim, KAMEZAWA Hiroyuki, linux-mm,
	Rik van Riel, LKML

KOSAKI Motohiro wrote:
> Hi
> 
>>> I think this is a race condition if mem_cgroup_move_lists's comment isn't right.
>>> I am not sure that it was already known problem.
>>>
>>> mem_cgroup_move_lists assume the appropriate zone's lru lock is already held.
>>> but putback_lru_page calls mem_cgroup_move_lists without holding lru_lock.
>> Hmmm, the comment on mem_cgroup_move_lists() does say this.  Although,
>> reading thru' the code, I can't see why it requires this.  But then it's
>> Monday, here...
> 
> I also think zone's lru lock is unnecessary.
> So, I guess below "it" indicate lock_page_cgroup, not zone lru lock.
> 

We need zone LRU lock, since the reclaim paths hold them. Not sure if I
understand why you call zone's LRU lock unnecessary, could you elaborate please?

>  >> But we cannot safely get to page_cgroup without it, so just try_lock it:
> 
> if my assumption is true, comment modifying is better.
> 
> 
>>> Repeatedly, spin_[un/lock]_irq use in mem_cgroup_move_list have a big overhead
>>> while doing list iteration.
>>>
>>> Do we have to use pagevec ?
>> This shouldn't be necessary, IMO.  putback_lru_page() is used as
>> follows:
>>
>> 1) in vmscan.c [shrink_*_list()] when an unevictable page is
>> encountered.  This should be relatively rare.  Once vmscan sees an
>> unevictable page, it parks it on the unevictable lru list where it
>> [vmscan] won't see the page again until it becomes reclaimable.
>>
>> 2) as a replacement for move_to_lru() in page migration as the inverse
>> to isolate_lru_page().  We did this to catch patches that became
>> unevictable or, more importantly, evictable while page migration held
>> them isolated.  move_to_lru() already grabbed and released the zone lru
>> lock on each page migrated.
>>
>> 3) In m[un]lock_vma_page() and clear_page_mlock(), new with in the
>> "mlocked pages are unevictable" series.  This one can result in a storm
>> of zone lru traffic--e.g., mlock()ing or munlocking() a large segment or
>> mlockall() of a task with a lot of mapped address space.  Again, this is
>> probably a very rare event--unless you're stressing [stressing over?]
>> mlock(), as I've been doing :)--and often involves a major fault [page
>> allocation], per page anyway.
>>
>> Ii>>? originally did have a pagevec for the unevictable lru but it
>> complicated ensuring that we don't strand evictable pages on the
>> unevictable list.  See the retry logic in putback_lru_page().
>>
>> As for the !UNEVICTABLE_LRU version, the only place this should be
>> called is from page migration as none of the other call sites are
>> compiled in or reachable when !UNEVICTABLE_LRU.
>>
>> Thoughts?
> 
> I think both opinion is correct.
> unevictable lru related code doesn't require pagevec.
> 
> but mem_cgroup_move_lists is used by active/inactive list transition too.
> then, pagevec is necessary for keeping reclaim throuput.
> 

It's on my TODO list. I hope to get to it soon.

> Kim-san, Thank you nice point out!
> I queued this fix to my TODO list.


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Race condition between putback_lru_page and mem_cgroup_move_list
  2008-08-04 17:52     ` Balbir Singh
@ 2008-08-04 23:52       ` MinChan Kim
  2008-08-05  6:20       ` KOSAKI Motohiro
  2008-08-06 16:53       ` Lee Schermerhorn
  2 siblings, 0 replies; 14+ messages in thread
From: MinChan Kim @ 2008-08-04 23:52 UTC (permalink / raw)
  To: balbir
  Cc: KOSAKI Motohiro, Lee Schermerhorn, KAMEZAWA Hiroyuki, linux-mm,
	Rik van Riel, LKML

On Tue, Aug 5, 2008 at 2:52 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> KOSAKI Motohiro wrote:
>> Hi
>>
>>>> I think this is a race condition if mem_cgroup_move_lists's comment isn't right.
>>>> I am not sure that it was already known problem.
>>>>
>>>> mem_cgroup_move_lists assume the appropriate zone's lru lock is already held.
>>>> but putback_lru_page calls mem_cgroup_move_lists without holding lru_lock.
>>> Hmmm, the comment on mem_cgroup_move_lists() does say this.  Although,
>>> reading thru' the code, I can't see why it requires this.  But then it's
>>> Monday, here...
>>
>> I also think zone's lru lock is unnecessary.
>> So, I guess below "it" indicate lock_page_cgroup, not zone lru lock.
>>
>
> We need zone LRU lock, since the reclaim paths hold them. Not sure if I

Could you explan why you need lru_lock more exact ?
I think it is need by race condition with global reclaim.
Are there any other cause ?

> understand why you call zone's LRU lock unnecessary, could you elaborate please?
>
>>  >> But we cannot safely get to page_cgroup without it, so just try_lock it:
>>
>> if my assumption is true, comment modifying is better.
>>
>>
>>>> Repeatedly, spin_[un/lock]_irq use in mem_cgroup_move_list have a big overhead
>>>> while doing list iteration.
>>>>
>>>> Do we have to use pagevec ?
>>> This shouldn't be necessary, IMO.  putback_lru_page() is used as
>>> follows:
>>>
>>> 1) in vmscan.c [shrink_*_list()] when an unevictable page is
>>> encountered.  This should be relatively rare.  Once vmscan sees an
>>> unevictable page, it parks it on the unevictable lru list where it
>>> [vmscan] won't see the page again until it becomes reclaimable.
>>>
>>> 2) as a replacement for move_to_lru() in page migration as the inverse
>>> to isolate_lru_page().  We did this to catch patches that became
>>> unevictable or, more importantly, evictable while page migration held
>>> them isolated.  move_to_lru() already grabbed and released the zone lru
>>> lock on each page migrated.
>>>
>>> 3) In m[un]lock_vma_page() and clear_page_mlock(), new with in the
>>> "mlocked pages are unevictable" series.  This one can result in a storm
>>> of zone lru traffic--e.g., mlock()ing or munlocking() a large segment or
>>> mlockall() of a task with a lot of mapped address space.  Again, this is
>>> probably a very rare event--unless you're stressing [stressing over?]
>>> mlock(), as I've been doing :)--and often involves a major fault [page
>>> allocation], per page anyway.
>>>
>>> I originally did have a pagevec for the unevictable lru but it
>>> complicated ensuring that we don't strand evictable pages on the
>>> unevictable list.  See the retry logic in putback_lru_page().
>>>
>>> As for the !UNEVICTABLE_LRU version, the only place this should be
>>> called is from page migration as none of the other call sites are
>>> compiled in or reachable when !UNEVICTABLE_LRU.
>>>
>>> Thoughts?
>>
>> I think both opinion is correct.
>> unevictable lru related code doesn't require pagevec.
>>
>> but mem_cgroup_move_lists is used by active/inactive list transition too.
>> then, pagevec is necessary for keeping reclaim throuput.
>>
>
> It's on my TODO list. I hope to get to it soon.
>
>> Kim-san, Thank you nice point out!
>> I queued this fix to my TODO list.
>
>
> --
>        Warm Regards,
>        Balbir Singh
>        Linux Technology Center
>        IBM, ISTL
>
>



-- 
Kinds regards,
MinChan Kim

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: Race condition between putback_lru_page and mem_cgroup_move_list
  2008-08-04 16:37   ` KOSAKI Motohiro
  2008-08-04 17:52     ` Balbir Singh
@ 2008-08-05  3:49     ` kamezawa.hiroyu
  1 sibling, 0 replies; 14+ messages in thread
From: kamezawa.hiroyu @ 2008-08-05  3:49 UTC (permalink / raw)
  To: balbir
  Cc: KOSAKI Motohiro, Lee Schermerhorn, MinChan Kim,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, LKML

>KOSAKI Motohiro wrote:
>> Hi
>> 
>>>> I think this is a race condition if mem_cgroup_move_lists's comment isn't
 right.
>>>> I am not sure that it was already known problem.
>>>>
>>>> mem_cgroup_move_lists assume the appropriate zone's lru lock is already h
eld.
>>>> but putback_lru_page calls mem_cgroup_move_lists without holding lru_lock
.
>>> Hmmm, the comment on mem_cgroup_move_lists() does say this.  Although,
>>> reading thru' the code, I can't see why it requires this.  But then it's
>>> Monday, here...
>> 
>> I also think zone's lru lock is unnecessary.
>> So, I guess below "it" indicate lock_page_cgroup, not zone lru lock.
>> 
>
>We need zone LRU lock, since the reclaim paths hold them. Not sure if I
>understand why you call zone's LRU lock unnecessary, could you elaborate plea
se?
>

I guess the comment should be against mem_cgroup_isolate_pages()...

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Race condition between putback_lru_page and mem_cgroup_move_list
  2008-08-04 17:52     ` Balbir Singh
  2008-08-04 23:52       ` MinChan Kim
@ 2008-08-05  6:20       ` KOSAKI Motohiro
  2008-08-05 10:46         ` Balbir Singh
  2008-08-06 16:53       ` Lee Schermerhorn
  2 siblings, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2008-08-05  6:20 UTC (permalink / raw)
  To: balbir
  Cc: kosaki.motohiro, Lee Schermerhorn, MinChan Kim,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, LKML

Hi Balbir-san,

> > I also think zone's lru lock is unnecessary.
> > So, I guess below "it" indicate lock_page_cgroup, not zone lru lock.
> 
> We need zone LRU lock, since the reclaim paths hold them. Not sure if I
> understand why you call zone's LRU lock unnecessary, could you elaborate please?

I tought..

1. in general, one data structure should be protected by one lock.
2. memcgroup lru is protected by mem_cgroup_per_zone::lru_lock.


if zone LRU lock must be held, Why do mem_cgroup_per_zone::lru_lock exit?
it should be removed?


Could you explain detail of "race condition with global reclaim race" ?



> > I think both opinion is correct.
> > unevictable lru related code doesn't require pagevec.
> > 
> > but mem_cgroup_move_lists is used by active/inactive list transition too.
> > then, pagevec is necessary for keeping reclaim throuput.
> > 
> 
> It's on my TODO list. I hope to get to it soon.

Very good news!
Thanks.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Race condition between putback_lru_page and mem_cgroup_move_list
  2008-08-05  6:20       ` KOSAKI Motohiro
@ 2008-08-05 10:46         ` Balbir Singh
  2008-08-05 11:19           ` KOSAKI Motohiro
  0 siblings, 1 reply; 14+ messages in thread
From: Balbir Singh @ 2008-08-05 10:46 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Lee Schermerhorn, MinChan Kim, KAMEZAWA Hiroyuki, linux-mm,
	Rik van Riel, LKML

KOSAKI Motohiro wrote:
> Hi Balbir-san,
> 
>>> I also think zone's lru lock is unnecessary.
>>> So, I guess below "it" indicate lock_page_cgroup, not zone lru lock.
>> We need zone LRU lock, since the reclaim paths hold them. Not sure if I
>> understand why you call zone's LRU lock unnecessary, could you elaborate please?
> 
> I tought..
> 
> 1. in general, one data structure should be protected by one lock.

In general yes, but in practice no. We have different paths through which a page
can be reclaimed. Consider the following

1. What happens if a global reclaim is in progress at the same time as memory
cgroup reclaim and they are both looking at the same page?
2. In the shared reclaim infrastructure, we move pages and update statistics for
pages belonging to a particular zone in a particular cgroup.

>> It's on my TODO list. I hope to get to it soon.
> 
> Very good news!

I hope they show the benefit that I expect them too :)

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Race condition between putback_lru_page and mem_cgroup_move_list
  2008-08-05 10:46         ` Balbir Singh
@ 2008-08-05 11:19           ` KOSAKI Motohiro
  2008-08-05 11:38             ` KOSAKI Motohiro
  0 siblings, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2008-08-05 11:19 UTC (permalink / raw)
  To: balbir
  Cc: kosaki.motohiro, Lee Schermerhorn, MinChan Kim,
	KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, LKML

> In general yes, but in practice no. We have different paths through which a page
> can be reclaimed. Consider the following
> 
> 1. What happens if a global reclaim is in progress at the same time as memory
> cgroup reclaim and they are both looking at the same page?
> 2. In the shared reclaim infrastructure, we move pages and update statistics for
> pages belonging to a particular zone in a particular cgroup.

hehe, you said mem_cgroup_per_zone::lru_lock is unnecessary lock.

Also, we can two approach

  1. the pages are allowed to exist in different zone of memcg zone and 
     global zone.
     and recover later (by mem_cgroup_isolate_pages).
     here is current implementation.
  2. the pages aren't allowed to exist in different zone of memcg zone and 
     global zone.
     (you said its mail)

if we select 2, I hope mem_cgroup_move_lists is called by ____pagevec_lru_add
and add_page_to_unevictable_list.
then, page's global lru transition become memcg lru transition automatically.
it increase source code readability.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Race condition between putback_lru_page and mem_cgroup_move_list
  2008-08-05 11:19           ` KOSAKI Motohiro
@ 2008-08-05 11:38             ` KOSAKI Motohiro
  0 siblings, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2008-08-05 11:38 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: balbir, Lee Schermerhorn, MinChan Kim, KAMEZAWA Hiroyuki,
	linux-mm, Rik van Riel, LKML

> Also, we can two approach

Also, we can *select* two approach.


sorry, send crap broken message ;-)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Race condition between putback_lru_page and mem_cgroup_move_list
  2008-08-04 17:52     ` Balbir Singh
  2008-08-04 23:52       ` MinChan Kim
  2008-08-05  6:20       ` KOSAKI Motohiro
@ 2008-08-06 16:53       ` Lee Schermerhorn
  2008-08-07 11:00         ` KOSAKI Motohiro
  2008-08-07 12:42         ` Balbir Singh
  2 siblings, 2 replies; 14+ messages in thread
From: Lee Schermerhorn @ 2008-08-06 16:53 UTC (permalink / raw)
  To: balbir, KOSAKI Motohiro
  Cc: MinChan Kim, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, LKML

On Mon, 2008-08-04 at 23:22 +0530, Balbir Singh wrote:
> KOSAKI Motohiro wrote:
> > Hi
> > 
> >>> I think this is a race condition if mem_cgroup_move_lists's comment isn't right.
> >>> I am not sure that it was already known problem.
> >>>
> >>> mem_cgroup_move_lists assume the appropriate zone's lru lock is already held.
> >>> but putback_lru_page calls mem_cgroup_move_lists without holding lru_lock.
> >> Hmmm, the comment on mem_cgroup_move_lists() does say this.  Although,
> >> reading thru' the code, I can't see why it requires this.  But then it's
> >> Monday, here...
> > 
> > I also think zone's lru lock is unnecessary.
> > So, I guess below "it" indicate lock_page_cgroup, not zone lru lock.
> > 
> 
> We need zone LRU lock, since the reclaim paths hold them. Not sure if I
> understand why you call zone's LRU lock unnecessary, could you elaborate please?

Hi, Balbir:

Sorry for the delay in responding.  Distracted...

I think that perhaps the zone's LRU lock is unnecessary because I didn't
see anything in mem_cgroup_move_lists() or it's callees that needed
protection by the zone lru_lock.  

Looking at the call sites in the reclaim paths [in
shrink_[in]active_page()] and activate_page(), they are holding the zone
lru_lock because they are manipulating the lru lists and/or zone
statistics.  The places where pages are moved to a new lru list is where
you want to insert calls to mem_cgroup_move_lists(), so I think they
just happen to fall under the zone lru lock.  

Now, in a subsequent message in this thread, you ask:

"1. What happens if a global reclaim is in progress at the same time as
memory cgroup reclaim and they are both looking at the same page?"

This should not happen, I think.  Racing global and memory cgroup calls
to __isolate_lru_page() are mutually excluded by the zone lru_lock taken
in shrink_[in]active_page().  In putback_lru_page(), where we call
mem_cgroup_move_lists() without holding the zone lru_lock, we've either
queued up the page for adding to one of the [in]active lists via the
pagevecs, or we've already moved it to the unevictable list.  If
mem_cgroup_isolate_pages() finds a page on one of the mz lists before it
has been drained to the LRU, it will [rightly] skip the page because
it's "!PageLRU(page)".


In same message, you state:

"2. In the shared reclaim infrastructure, we move pages and update
statistics for pages belonging to a particular zone in a particular
cgroup."

Sorry, I don't understand your point.  Are you concerned that the stats
can get out of sync?  I suppose that, in general, if we called
mem_cgroup_move_lists() from just anywhere without zone lru_lock
protection, we could have problems.  In the case of putback_lru_page(),
again, we've already put the page back on the global unevictable list
and updated the global stats, or it's on it's way to an [in]active list
via the pagevecs.  The stats will be updated when the pagevecs are
drained.

I think we're OK without explicit zone lru locking around the call to
mem_cgroup_move_lists() and the global lru list additions in
putback_lru_page().   

> 
> >  >> But we cannot safely get to page_cgroup without it, so just try_lock it:
> > 
> > if my assumption is true, comment modifying is better.
> > 
> > 
> >>> Repeatedly, spin_[un/lock]_irq use in mem_cgroup_move_list have a big overhead
> >>> while doing list iteration.
> >>>
> >>> Do we have to use pagevec ?
> >> This shouldn't be necessary, IMO.  putback_lru_page() is used as
> >> follows:
> >>
> >> 1) in vmscan.c [shrink_*_list()] when an unevictable page is
> >> encountered.  This should be relatively rare.  Once vmscan sees an
> >> unevictable page, it parks it on the unevictable lru list where it
> >> [vmscan] won't see the page again until it becomes reclaimable.
> >>
> >> 2) as a replacement for move_to_lru() in page migration as the inverse
> >> to isolate_lru_page().  We did this to catch patches that became
> >> unevictable or, more importantly, evictable while page migration held
> >> them isolated.  move_to_lru() already grabbed and released the zone lru
> >> lock on each page migrated.
> >>
> >> 3) In m[un]lock_vma_page() and clear_page_mlock(), new with in the
> >> "mlocked pages are unevictable" series.  This one can result in a storm
> >> of zone lru traffic--e.g., mlock()ing or munlocking() a large segment or
> >> mlockall() of a task with a lot of mapped address space.  Again, this is
> >> probably a very rare event--unless you're stressing [stressing over?]
> >> mlock(), as I've been doing :)--and often involves a major fault [page
> >> allocation], per page anyway.
> >>
> >> Ii>>? originally did have a pagevec for the unevictable lru but it
> >> complicated ensuring that we don't strand evictable pages on the
> >> unevictable list.  See the retry logic in putback_lru_page().
> >>
> >> As for the !UNEVICTABLE_LRU version, the only place this should be
> >> called is from page migration as none of the other call sites are
> >> compiled in or reachable when !UNEVICTABLE_LRU.
> >>
> >> Thoughts?
> > 
> > I think both opinion is correct.
> > unevictable lru related code doesn't require pagevec.
> > 
> > but mem_cgroup_move_lists is used by active/inactive list transition too.
> > then, pagevec is necessary for keeping reclaim throuput.
> > 

Kosaki-san:

If you mean the "active/inactive list transition" in
shrink_[in]active_list(), these are already batched under zone lru_lock
with batch size determined by the 'release pages' pvec.  So, I think
we're OK here.

If you mean in "activate_page()", it appears to handle one page at a
time to keep the stats in sync.  Not sure whether it's amenable to a
pagevec approach.  In any case, the FIXME comment there asks if it can
be sped up and adding the call to mem_cgroup_move_lists() probably
didn't [speed it up, I mean].  So, have at it! :)

> 
> It's on my TODO list. I hope to get to it soon.
> 
> > Kim-san, Thank you nice point out!
> > I queued this fix to my TODO list.
> 

Yes.  Thanks for reviewing this.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Race condition between putback_lru_page and mem_cgroup_move_list
  2008-08-06 16:53       ` Lee Schermerhorn
@ 2008-08-07 11:00         ` KOSAKI Motohiro
  2008-08-07 11:27           ` Lee Schermerhorn
  2008-08-07 12:42         ` Balbir Singh
  1 sibling, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2008-08-07 11:00 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: kosaki.motohiro, balbir, MinChan Kim, KAMEZAWA Hiroyuki,
	linux-mm, Rik van Riel, LKML

Hi

> If you mean the "active/inactive list transition" in
> shrink_[in]active_list(), these are already batched under zone lru_lock
> with batch size determined by the 'release pages' pvec.  So, I think
> we're OK here.

No.

AFAIK shrink_inactive_list batched zone->lru_lock, 
but it doesn't batched mz->lru_lock.

then, spin_lock_irqsave is freqently called.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Race condition between putback_lru_page and mem_cgroup_move_list
  2008-08-07 11:00         ` KOSAKI Motohiro
@ 2008-08-07 11:27           ` Lee Schermerhorn
  0 siblings, 0 replies; 14+ messages in thread
From: Lee Schermerhorn @ 2008-08-07 11:27 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: balbir, MinChan Kim, KAMEZAWA Hiroyuki, linux-mm, Rik van Riel, LKML

On Thu, 2008-08-07 at 20:00 +0900, KOSAKI Motohiro wrote:
> Hi
> 
> > If you mean the "active/inactive list transition" in
> > shrink_[in]active_list(), these are already batched under zone lru_lock
> > with batch size determined by the 'release pages' pvec.  So, I think
> > we're OK here.
> 
> No.
> 
> AFAIK shrink_inactive_list batched zone->lru_lock, 
> but it doesn't batched mz->lru_lock.
> 
> then, spin_lock_irqsave is freqently called.

Ah, I see what you mean.  Yes, the mem cgroup zone lru_lock will be
cycled frequently as each back of pages is put back during reclaim.  So,
you'd like to eliminate the mz lru_lock, move the mem cgroup zone info
under the corresponding zone lru_lock and move the page between memcg
lists atomically with adding to global lru lists?  

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Race condition between putback_lru_page and mem_cgroup_move_list
  2008-08-06 16:53       ` Lee Schermerhorn
  2008-08-07 11:00         ` KOSAKI Motohiro
@ 2008-08-07 12:42         ` Balbir Singh
  1 sibling, 0 replies; 14+ messages in thread
From: Balbir Singh @ 2008-08-07 12:42 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: KOSAKI Motohiro, MinChan Kim, KAMEZAWA Hiroyuki, linux-mm,
	Rik van Riel, LKML

Lee Schermerhorn wrote:
> On Mon, 2008-08-04 at 23:22 +0530, Balbir Singh wrote:
>> KOSAKI Motohiro wrote:
>>> Hi
>>>
>>>>> I think this is a race condition if mem_cgroup_move_lists's comment isn't right.
>>>>> I am not sure that it was already known problem.
>>>>>
>>>>> mem_cgroup_move_lists assume the appropriate zone's lru lock is already held.
>>>>> but putback_lru_page calls mem_cgroup_move_lists without holding lru_lock.
>>>> Hmmm, the comment on mem_cgroup_move_lists() does say this.  Although,
>>>> reading thru' the code, I can't see why it requires this.  But then it's
>>>> Monday, here...
>>> I also think zone's lru lock is unnecessary.
>>> So, I guess below "it" indicate lock_page_cgroup, not zone lru lock.
>>>
>> We need zone LRU lock, since the reclaim paths hold them. Not sure if I
>> understand why you call zone's LRU lock unnecessary, could you elaborate please?
> 
> Hi, Balbir:
> 
> Sorry for the delay in responding.  Distracted...
> 

No problem at all.

> I think that perhaps the zone's LRU lock is unnecessary because I didn't
> see anything in mem_cgroup_move_lists() or it's callees that needed
> protection by the zone lru_lock.  
> 
> Looking at the call sites in the reclaim paths [in
> shrink_[in]active_page()] and activate_page(), they are holding the zone
> lru_lock because they are manipulating the lru lists and/or zone
> statistics. 

Precisely, my point below about updating statistics for zones and you mention
below that the zone LRU excludes the race I mentioned in (1). I am a bit
confused with that statement, do you agree that zone lru_lock excludes the race
and is therefore required?

 The places where pages are moved to a new lru list is where
> you want to insert calls to mem_cgroup_move_lists(), so I think they
> just happen to fall under the zone lru lock.  
> 
> Now, in a subsequent message in this thread, you ask:
> 
> "1. What happens if a global reclaim is in progress at the same time as
> memory cgroup reclaim and they are both looking at the same page?"
> 
> This should not happen, I think.  Racing global and memory cgroup calls
> to __isolate_lru_page() are mutually excluded by the zone lru_lock taken
> in shrink_[in]active_page().

Yes, I was referring to needing the zone lru_lock

  In putback_lru_page(), where we call
> mem_cgroup_move_lists() without holding the zone lru_lock, we've either
> queued up the page for adding to one of the [in]active lists via the
> pagevecs, or we've already moved it to the unevictable list.  If
> mem_cgroup_isolate_pages() finds a page on one of the mz lists before it
> has been drained to the LRU, it will [rightly] skip the page because
> it's "!PageLRU(page)".
> 
> 
> In same message, you state:
> 
> "2. In the shared reclaim infrastructure, we move pages and update
> statistics for pages belonging to a particular zone in a particular
> cgroup."
> 
> Sorry, I don't understand your point.  Are you concerned that the stats
> can get out of sync?  I suppose that, in general, if we called
> mem_cgroup_move_lists() from just anywhere without zone lru_lock
> protection, we could have problems.  In the case of putback_lru_page(),
> again, we've already put the page back on the global unevictable list
> and updated the global stats, or it's on it's way to an [in]active list
> via the pagevecs.  The stats will be updated when the pagevecs are
> drained.
> 
> I think we're OK without explicit zone lru locking around the call to
> mem_cgroup_move_lists() and the global lru list additions in
> putback_lru_page().   
> 

I think I understand what you are stating clearly

We don't need the zone lru_lock in putback_lru_page(). Am I missing something or
do I have it right? (It's Thursday and one of my legs is already in the weekend).


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-08-07 12:42 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-04 14:36 Race condition between putback_lru_page and mem_cgroup_move_list MinChan Kim
2008-08-04 15:31 ` Lee Schermerhorn
2008-08-04 16:37   ` KOSAKI Motohiro
2008-08-04 17:52     ` Balbir Singh
2008-08-04 23:52       ` MinChan Kim
2008-08-05  6:20       ` KOSAKI Motohiro
2008-08-05 10:46         ` Balbir Singh
2008-08-05 11:19           ` KOSAKI Motohiro
2008-08-05 11:38             ` KOSAKI Motohiro
2008-08-06 16:53       ` Lee Schermerhorn
2008-08-07 11:00         ` KOSAKI Motohiro
2008-08-07 11:27           ` Lee Schermerhorn
2008-08-07 12:42         ` Balbir Singh
2008-08-05  3:49     ` kamezawa.hiroyu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox