* [BUGFIX][PATCH mmotm] memcg fix swap accounting leak
@ 2008-12-12 8:29 KAMEZAWA Hiroyuki
2008-12-12 9:43 ` Daisuke Nishimura
0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-12-12 8:29 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel, balbir, nishimura, akpm
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Fix swap-page-fault charge leak of memcg.
Now, memcg has hooks to swap-out operation and checks SwapCache is really
unused or not. That check depends on contents of struct page.
I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as
still-in-use.
Now, reuse_swap_page() calles delete_from_swap_cache() before establishment
of any rmap. Then, in followinig sequence
(Page fault with WRITE)
Assume the page is SwapCache "on memory (still charged)"
try_charge() (charge += PAGESIZE)
commit_charge()
=> (Check page_cgroup and found PCG_USED bit, charge-=PAGE_SIZE
because it seems already charged.)
reuse_swap_page()
-> delete_from_swapcache()
-> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE)
......
too much uncharge.....
To avoid this, move commit_charge() after page_mapcount() goes up to 1.
By this,
Assume the page is SwapCache "on memory"
try_charge() (charge += PAGESIZE)
reuse_swap_page() (may charge -= PAGESIZE if PCG_USED is set)
commit_charge() (Ony if page_cgroup is marked as PCG_USED,
charge -= PAGESIZE)
Accounting will be correct.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: mmotm-2.6.28-Dec11/mm/memory.c
===================================================================
--- mmotm-2.6.28-Dec11.orig/mm/memory.c
+++ mmotm-2.6.28-Dec11/mm/memory.c
@@ -2433,17 +2433,17 @@ static int do_swap_page(struct mm_struct
* which may delete_from_swap_cache().
*/
- mem_cgroup_commit_charge_swapin(page, ptr);
inc_mm_counter(mm, anon_rss);
pte = mk_pte(page, vma->vm_page_prot);
if (write_access && reuse_swap_page(page)) {
pte = maybe_mkwrite(pte_mkdirty(pte), vma);
write_access = 0;
}
-
flush_icache_page(vma, page);
set_pte_at(mm, address, page_table, pte);
page_add_anon_rmap(page, vma, address);
+ /* It's better to call commit-charge after rmap is established */
+ mem_cgroup_commit_charge_swapin(page, ptr);
swap_free(entry);
if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak 2008-12-12 8:29 [BUGFIX][PATCH mmotm] memcg fix swap accounting leak KAMEZAWA Hiroyuki @ 2008-12-12 9:43 ` Daisuke Nishimura 2008-12-12 11:16 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 13+ messages in thread From: Daisuke Nishimura @ 2008-12-12 9:43 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm, linux-kernel, balbir, akpm, Hugh Dickins, nishimura (add CC: Hugh Dickins <hugh@veritas.com>) On Fri, 12 Dec 2008 17:29:30 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > Fix swap-page-fault charge leak of memcg. > > Now, memcg has hooks to swap-out operation and checks SwapCache is really > unused or not. That check depends on contents of struct page. > I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as > still-in-use. > > Now, reuse_swap_page() calles delete_from_swap_cache() before establishment > of any rmap. Then, in followinig sequence > > (Page fault with WRITE) > Assume the page is SwapCache "on memory (still charged)" > try_charge() (charge += PAGESIZE) > commit_charge() > => (Check page_cgroup and found PCG_USED bit, charge-=PAGE_SIZE > because it seems already charged.) > reuse_swap_page() > -> delete_from_swapcache() > -> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE) > ...... > > too much uncharge..... > > To avoid this, move commit_charge() after page_mapcount() goes up to 1. > By this, > Assume the page is SwapCache "on memory" > try_charge() (charge += PAGESIZE) > reuse_swap_page() (may charge -= PAGESIZE if PCG_USED is set) > commit_charge() (Ony if page_cgroup is marked as PCG_USED, > charge -= PAGESIZE) > Accounting will be correct. > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> I've confirmed that the problem you reported offline is fixed, but... > --- > mm/memory.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > Index: mmotm-2.6.28-Dec11/mm/memory.c > =================================================================== > --- mmotm-2.6.28-Dec11.orig/mm/memory.c > +++ mmotm-2.6.28-Dec11/mm/memory.c > @@ -2433,17 +2433,17 @@ static int do_swap_page(struct mm_struct > * which may delete_from_swap_cache(). > */ > The comment here says: /* * The page isn't present yet, go ahead with the fault. * * Be careful about the sequence of operations here. * To get its accounting right, reuse_swap_page() must be called * while the page is counted on swap but not yet in mapcount i.e. * before page_add_anon_rmap() and swap_free(); try_to_free_swap() * must be called after the swap_free(), or it will never succeed. * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry * in page->private, must be called before reuse_swap_page(), * which may delete_from_swap_cache(). */ Hmm.. should we save page->private before calling reuse_swap_page and pass it to mem_cgroup_commit_charge_swapin(I think it cannot be changed because the page is locked)? Thanks, Daisuke Nishimura. > - mem_cgroup_commit_charge_swapin(page, ptr); > inc_mm_counter(mm, anon_rss); > pte = mk_pte(page, vma->vm_page_prot); > if (write_access && reuse_swap_page(page)) { > pte = maybe_mkwrite(pte_mkdirty(pte), vma); > write_access = 0; > } > - > flush_icache_page(vma, page); > set_pte_at(mm, address, page_table, pte); > page_add_anon_rmap(page, vma, address); > + /* It's better to call commit-charge after rmap is established */ > + mem_cgroup_commit_charge_swapin(page, ptr); > > swap_free(entry); > if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak 2008-12-12 9:43 ` Daisuke Nishimura @ 2008-12-12 11:16 ` KAMEZAWA Hiroyuki 2008-12-13 7:03 ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) KAMEZAWA Hiroyuki 0 siblings, 1 reply; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-12-12 11:16 UTC (permalink / raw) To: Daisuke Nishimura Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, balbir, akpm, Hugh Dickins Daisuke Nishimura said: > > /* > * The page isn't present yet, go ahead with the fault. > * > * Be careful about the sequence of operations here. > * To get its accounting right, reuse_swap_page() must be called > * while the page is counted on swap but not yet in mapcount i.e. > * before page_add_anon_rmap() and swap_free(); try_to_free_swap() > * must be called after the swap_free(), or it will never succeed. > * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry > * in page->private, must be called before reuse_swap_page(), > * which may delete_from_swap_cache(). > */ > > Hmm.. should we save page->private before calling reuse_swap_page and pass > it > to mem_cgroup_commit_charge_swapin(I think it cannot be changed because > the page > is locked)? > seems not necessary (see below). I'll fix comment if I uses my pc tomorrow.. Considering 2 cases, A. the SwapCache is already chareged before try_charge_swapin() B. the SwapCache is very new and not charged before try_charge_swapin() Case A. 0. We have charge of PAGE_SIZE to this page before reach here. 1. try_charge_swapin() is called and charge += PAGE_SIZE 2. reuse_swap_page() is called. when delete_from_swap_cache() is called.. 2-a. if already mapped, no change in charges. 2-b. if not mapped, charge-=PAGE_SIZE. PCG_USED bit is cleared. and charge-record is written into swap_cgroup not called. 2-c. no changes in charge. 3. commit_charge is called. 3-a. PCG_USED bit is set, so charge -= PAGE_SIZE. 3-b. PCG_USED bit is cleared and so we set PCG_USED bit and no changes in charge. 3-c. no changes in charge. 4-b. swap_free() will clear record in swap_cgroup. Then, finally we have PAGE_SIZE of charge to this page. Case B. 0. We have no charges to this page. 1. try_charge_swapin() is called and charge += PAGE_SIZE. 2. reuse_swap_page() is called. 2-a if delete_from_swap_cache() is called. the page is not mapped. but PCG_USED bit is not set. so, no change in charges finally. (just recorded in swap_cgroup) 2-b. not called ... no changes in charge. 3. commit_charge() is called and set PCG_USED bit. no changes in charnge. 4. swap_free() is called and clear record in swap_cgroup. Then, finally we have PAGE_SIZE of charge to this page. Thanks, -Kame ^ permalink raw reply [flat|nested] 13+ messages in thread
* [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) 2008-12-12 11:16 ` KAMEZAWA Hiroyuki @ 2008-12-13 7:03 ` KAMEZAWA Hiroyuki 2008-12-13 9:49 ` Hugh Dickins 0 siblings, 1 reply; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-12-13 7:03 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Daisuke Nishimura, linux-mm, linux-kernel, balbir, akpm, Hugh Dickins Updated explanation and fixed comment. == From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Fix swapin charge operation of memcg. Now, memcg has hooks to swap-out operation and checks SwapCache is really unused or not. That check depends on contents of struct page. I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as still-in-use. Now, reuse_swap_page() calles delete_from_swap_cache() before establishment of any rmap. Then, in followinig sequence (Page fault with WRITE) try_charge() (charge += PAGESIZE) commit_charge() (Check page_cgroup is used or not..) reuse_swap_page() -> delete_from_swapcache() -> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE) ...... New charge is uncharged soon.... To avoid this, move commit_charge() after page_mapcount() goes up to 1. By this, try_charge() (usage += PAGESIZE) reuse_swap_page() (may usage -= PAGESIZE if PCG_USED is set) commit_charge() (If page_cgroup is not marked as PCG_USED, add new charge.) Accounting will be correct. Changelog (v1) -> (v2) - fixed comment. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- Documentation/controllers/memcg_test.txt | 50 +++++++++++++++++++++++++++++-- mm/memory.c | 11 +++--- 2 files changed, 54 insertions(+), 7 deletions(-) Index: mmotm-2.6.28-Dec12/mm/memory.c =================================================================== --- mmotm-2.6.28-Dec12.orig/mm/memory.c +++ mmotm-2.6.28-Dec12/mm/memory.c @@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct * while the page is counted on swap but not yet in mapcount i.e. * before page_add_anon_rmap() and swap_free(); try_to_free_swap() * must be called after the swap_free(), or it will never succeed. - * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry - * in page->private, must be called before reuse_swap_page(), - * which may delete_from_swap_cache(). + * Because delete_from_swap_page() may be called by reuse_swap_page(), + * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry + * in page->private. In this case, a record in swap_cgroup is silently + * discarded at swap_free(). */ - mem_cgroup_commit_charge_swapin(page, ptr); inc_mm_counter(mm, anon_rss); pte = mk_pte(page, vma->vm_page_prot); if (write_access && reuse_swap_page(page)) { pte = maybe_mkwrite(pte_mkdirty(pte), vma); write_access = 0; } - flush_icache_page(vma, page); set_pte_at(mm, address, page_table, pte); page_add_anon_rmap(page, vma, address); + /* It's better to call commit-charge after rmap is established */ + mem_cgroup_commit_charge_swapin(page, ptr); swap_free(entry); if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt =================================================================== --- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt +++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt @@ -1,6 +1,6 @@ Memory Resource Controller(Memcg) Implementation Memo. -Last Updated: 2008/12/10 -Base Kernel Version: based on 2.6.28-rc7-mm. +Last Updated: 2008/12/13 +Base Kernel Version: based on 2.6.28-rc8-mm. Because VM is getting complex (one of reasons is memcg...), memcg's behavior is complex. This is a document for memcg's internal behavior. @@ -115,6 +115,52 @@ Under below explanation, we assume CONFI (But racy state between (a) and (b) exists. We do check it.) At charging, a charge recorded in swap_cgroup is moved to page_cgroup. + In case (a), reuse_swap_page() may call delete_from_swap_cache() if + the page can drop swp_entry and be reused for "WRITE". + Note: If the page may be accounted before (A), if it isn't kicked out + to disk before page fault. + + case A) the page is not accounted as SwapCache and SwapCache is deleted + by reuse_swap_page(). + 1. try_charge_swapin() is called and + - charge_for_memory +=1. + - charge_for_memsw +=1. + 2. reuse_swap_page -> delete_from_swap_cache() is called. + because the page is not accounted as SwapCache, + no changes in accounting. + 3. commit_charge_swapin() finds PCG_USED bit is not set and + set PCG_USED bit. + Because page->private is empty by 2. no changes in charge. + 4. swap_free(entry) is called. + - charge_for_memsw -= 1. + + Finally, charge_for_memory +=1, charge_for_memsw = +-0. + + case B) the page is accounted as SwapCache and SwapCache is deleted + by reuse_swap_page. + 1. try_charge_swapin() is called. + - charge_for_memory += 1. + - charge_for_memsw += 1. + 2. reuse_swap_page -> delete_from_swap_cache() is called. + PCG_USED bit is found and cleared. + - charge_for_memory -= 1. (swap_cgroup is recorded.) + 3. commit_charge_swapin() finds PCG_USED bit is not set. + 4. swap_free(entry) is called and + - charge_for_memsw -= 1. + + Finally, charge_for_memory = +-0, charge_for_memsw = +-0. + + case C) the page is not accounted as SwapCache and reuse_swap_page + doesn't call delete_from_swap_cache() + 1. try_charge_swapin() is called. + - charge_for_memory += 1. + - charge_for_memsw += 1. + 2. commit_charge_swapin() finds PCG_USED bit is not set + and finds swap_cgroup records this entry. + - charge_for_memsw -= 1. + + Finally, charge_for_memory +=1, charge_for_memsw = +-0 + 4.2 Swap-out. At swap-out, typical state transition is below. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) 2008-12-13 7:03 ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) KAMEZAWA Hiroyuki @ 2008-12-13 9:49 ` Hugh Dickins 2008-12-13 10:27 ` KAMEZAWA Hiroyuki 2008-12-13 10:38 ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) KAMEZAWA Hiroyuki 0 siblings, 2 replies; 13+ messages in thread From: Hugh Dickins @ 2008-12-13 9:49 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: Daisuke Nishimura, linux-mm, linux-kernel, balbir, akpm On Sat, 13 Dec 2008, KAMEZAWA Hiroyuki wrote: > --- mmotm-2.6.28-Dec12.orig/mm/memory.c > +++ mmotm-2.6.28-Dec12/mm/memory.c > > - mem_cgroup_commit_charge_swapin(page, ptr); > inc_mm_counter(mm, anon_rss); > pte = mk_pte(page, vma->vm_page_prot); > if (write_access && reuse_swap_page(page)) { > pte = maybe_mkwrite(pte_mkdirty(pte), vma); > write_access = 0; > } > - > flush_icache_page(vma, page); > set_pte_at(mm, address, page_table, pte); > page_add_anon_rmap(page, vma, address); > + /* It's better to call commit-charge after rmap is established */ > + mem_cgroup_commit_charge_swapin(page, ptr); > > swap_free(entry); > if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) That ordering is back to how it was before I adjusted it for reuse_swap_page()'s delete_from_swap_cache(), isn't it? So I don't understand how you've fixed the bug I hit (not an accounting imbalance but an oops or BUG, I forget) with this ordering, without making some other change elsewhere. mem_cgroup_commit_charge_swapin calls swap_cgroup_record with bogus swp_entry_t 0, which appears to belong to swp_offset 0 of swp_type 0, but the ctrl->map for type 0 may have been freed ages ago (we do always start from 0, but maybe we swapped on type 1 and swapped off type 0 meanwhile). I'm guessing that by looking at the code, not by retesting it, so I may have the details wrong; but I didn't reorder your code just for fun. Perhaps your restored ordering works if you check PageSwapCache in mem_cgroup_commit_charge_swapin or check 0 in swap_cgroup_record, but I don't see that in yesterday's mmotm, nor in this patch. (And I should admit, I've not even attempted to follow your accounting justification: I'll leave that to you memcg guys.) An alternative could be not to clear page->private when deleting from swap cache, that's only done for tidiness and to force notice of races like this; but I'd want a much stronger reason to change that. Or am I making this up? As I say, I've not tested it this time around. Hugh ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) 2008-12-13 9:49 ` Hugh Dickins @ 2008-12-13 10:27 ` KAMEZAWA Hiroyuki 2008-12-15 7:07 ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3) KAMEZAWA Hiroyuki 2008-12-13 10:38 ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) KAMEZAWA Hiroyuki 1 sibling, 1 reply; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-12-13 10:27 UTC (permalink / raw) To: Hugh Dickins Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, linux-mm, linux-kernel, balbir, akpm Hugh Dickins said: > On Sat, 13 Dec 2008, KAMEZAWA Hiroyuki wrote: >> --- mmotm-2.6.28-Dec12.orig/mm/memory.c >> +++ mmotm-2.6.28-Dec12/mm/memory.c >> >> - mem_cgroup_commit_charge_swapin(page, ptr); >> inc_mm_counter(mm, anon_rss); >> pte = mk_pte(page, vma->vm_page_prot); >> if (write_access && reuse_swap_page(page)) { >> pte = maybe_mkwrite(pte_mkdirty(pte), vma); >> write_access = 0; >> } >> - >> flush_icache_page(vma, page); >> set_pte_at(mm, address, page_table, pte); >> page_add_anon_rmap(page, vma, address); >> + /* It's better to call commit-charge after rmap is established */ >> + mem_cgroup_commit_charge_swapin(page, ptr); >> >> swap_free(entry); >> if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || >> PageMlocked(page)) > > That ordering is back to how it was before I adjusted it > for reuse_swap_page()'s delete_from_swap_cache(), isn't it? > > So I don't understand how you've fixed the bug I hit (not an > accounting imbalance but an oops or BUG, I forget) with this > ordering, without making some other change elsewhere. > Ah, this is for fixing the bug by this order of calls. == > mem_cgroup_commit_charge_swapin calls swap_cgroup_record with > bogus swp_entry_t 0, which appears to belong to swp_offset 0 of > swp_type 0, but the ctrl->map for type 0 may have been freed > ages ago (we do always start from 0, but maybe we swapped on > type 1 and swapped off type 0 meanwhile). I'm guessing that > by looking at the code, not by retesting it, so I may have the > details wrong; but I didn't reorder your code just for fun. > Ah, sorry. commit_charge_swapin() should chekc the page is still SwapCache. Sorry. I'll update this in Monday. > Perhaps your restored ordering works if you check PageSwapCache > in mem_cgroup_commit_charge_swapin or check 0 in swap_cgroup_record, > but I don't see that in yesterday's mmotm, nor in this patch. > yes. I'm wrong at that point. I'll add PageSwapCache check to "commit" ops. > (And I should admit, I've not even attempted to follow your > accounting justification: I'll leave that to you memcg guys.) > > An alternative could be not to clear page->private when deleting > from swap cache, that's only done for tidiness and to force notice > of races like this; but I'd want a much stronger reason to change that. > Hmm, doesn't that change will add new unnecessary complex ? > Or am I making this up? As I say, I've not tested it this time around. > I'll revisit this Monday and think of swp_entry==0 problem. Thank you for pointing out. -Kame > Hugh > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3) 2008-12-13 10:27 ` KAMEZAWA Hiroyuki @ 2008-12-15 7:07 ` KAMEZAWA Hiroyuki 2008-12-15 8:37 ` Balbir Singh ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-12-15 7:07 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Hugh Dickins, Daisuke Nishimura, linux-mm, linux-kernel, balbir, akpm From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Fix swapin charge operation of memcg. Now, memcg has hooks to swap-out operation and checks SwapCache is really unused or not. That check depends on contents of struct page. I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as still-in-use. Now, reuse_swap_page() calles delete_from_swap_cache() before establishment of any rmap. Then, in followinig sequence (Page fault with WRITE) try_charge() (charge += PAGESIZE) commit_charge() (Check page_cgroup is used or not..) reuse_swap_page() -> delete_from_swapcache() -> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE) ...... New charge is uncharged soon.... To avoid this, move commit_charge() after page_mapcount() goes up to 1. By this, try_charge() (usage += PAGESIZE) reuse_swap_page() (may usage -= PAGESIZE if PCG_USED is set) commit_charge() (If page_cgroup is not marked as PCG_USED, add new charge.) Accounting will be correct. Changelog (v2) -> (v3) - fixed invalid charge to swp_entry==0. - updated documentation. Changelog (v1) -> (v2) - fixed comment. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- Documentation/controllers/memcg_test.txt | 41 +++++++++++++++++++++++++++---- mm/memcontrol.c | 7 +++-- mm/memory.c | 11 ++++---- 3 files changed, 46 insertions(+), 13 deletions(-) Index: mmotm-2.6.28-Dec12/mm/memory.c =================================================================== --- mmotm-2.6.28-Dec12.orig/mm/memory.c +++ mmotm-2.6.28-Dec12/mm/memory.c @@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct * while the page is counted on swap but not yet in mapcount i.e. * before page_add_anon_rmap() and swap_free(); try_to_free_swap() * must be called after the swap_free(), or it will never succeed. - * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry - * in page->private, must be called before reuse_swap_page(), - * which may delete_from_swap_cache(). + * Because delete_from_swap_page() may be called by reuse_swap_page(), + * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry + * in page->private. In this case, a record in swap_cgroup is silently + * discarded at swap_free(). */ - mem_cgroup_commit_charge_swapin(page, ptr); inc_mm_counter(mm, anon_rss); pte = mk_pte(page, vma->vm_page_prot); if (write_access && reuse_swap_page(page)) { pte = maybe_mkwrite(pte_mkdirty(pte), vma); write_access = 0; } - flush_icache_page(vma, page); set_pte_at(mm, address, page_table, pte); page_add_anon_rmap(page, vma, address); + /* It's better to call commit-charge after rmap is established */ + mem_cgroup_commit_charge_swapin(page, ptr); swap_free(entry); if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt =================================================================== --- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt +++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt @@ -1,6 +1,6 @@ Memory Resource Controller(Memcg) Implementation Memo. -Last Updated: 2008/12/10 -Base Kernel Version: based on 2.6.28-rc7-mm. +Last Updated: 2008/12/15 +Base Kernel Version: based on 2.6.28-rc8-mm. Because VM is getting complex (one of reasons is memcg...), memcg's behavior is complex. This is a document for memcg's internal behavior. @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI (b) If the SwapCache has been mapped by processes, it has been charged already. - In case (a), we charge it. In case (b), we don't charge it. - (But racy state between (a) and (b) exists. We do check it.) - At charging, a charge recorded in swap_cgroup is moved to page_cgroup. + This swap-in is one of the most complicated work. In do_swap_page(), + following events occur when pte is unchanged. + + (1) the page (SwapCache) is looked up. + (2) lock_page() + (3) try_charge_swapin() + (4) reuse_swap_page() (may call delete_swap_cache()) + (5) commit_charge_swapin() + (6) swap_free(). + + Considering following situation for example. + + (A) The page has not been charged before (2) and reuse_swap_page() + doesn't call delete_from_swap_cache(). + (B) The page has not been charged before (2) and reuse_swap_page() + calls delete_from_swap_cache(). + (C) The page has been charged before (2) and reuse_swap_page() doesn't + call delete_from_swap_cache(). + (D) The page has been charged before (2) and reuse_swap_page() calls + delete_from_swap_cache(). + + memory.usage/memsw.usage changes to this page/swp_entry will be + Case (A) (B) (C) (D) + Event + Before (2) 0/ 1 0/ 1 1/ 1 1/ 1 + =========================================== + (3) +1/+1 +1/+1 +1/+1 +1/+1 + (4) - 0/ 0 - -1/ 0 + (5) 0/ 1 0/-1 -1/-1 0/ 0 + (6) - - - 0/-1 + =========================================== + Result 1/ 1 1/1 1/ 1 1/ 1 + + In any cases, charges to this page should be 1/ 1. 4.2 Swap-out. At swap-out, typical state transition is below. Index: mmotm-2.6.28-Dec12/mm/memcontrol.c =================================================================== --- mmotm-2.6.28-Dec12.orig/mm/memcontrol.c +++ mmotm-2.6.28-Dec12/mm/memcontrol.c @@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str /* * Now swap is on-memory. This means this page may be * counted both as mem and swap....double count. - * Fix it by uncharging from memsw. This SwapCache is stable - * because we're still under lock_page(). + * Fix it by uncharging from memsw. Basically, this SwapCache is stable + * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page() + * may call delete_from_swap_cache() before reach here. */ - if (do_swap_account) { + if (do_swap_account && PageSwapCache(page)) { swp_entry_t ent = {.val = page_private(page)}; struct mem_cgroup *memcg; memcg = swap_cgroup_record(ent, NULL); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3) 2008-12-15 7:07 ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3) KAMEZAWA Hiroyuki @ 2008-12-15 8:37 ` Balbir Singh 2008-12-15 8:40 ` KAMEZAWA Hiroyuki 2008-12-15 10:34 ` Hugh Dickins 2008-12-16 4:02 ` [PATCH mmotm] memcg: fix for documentation (Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)) Daisuke Nishimura 2 siblings, 1 reply; 13+ messages in thread From: Balbir Singh @ 2008-12-15 8:37 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Hugh Dickins, Daisuke Nishimura, linux-mm, linux-kernel, akpm * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2008-12-15 16:07:51]: > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > Fix swapin charge operation of memcg. > > Now, memcg has hooks to swap-out operation and checks SwapCache is really > unused or not. That check depends on contents of struct page. > I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as > still-in-use. > > Now, reuse_swap_page() calles delete_from_swap_cache() before establishment > of any rmap. Then, in followinig sequence > > (Page fault with WRITE) > try_charge() (charge += PAGESIZE) > commit_charge() (Check page_cgroup is used or not..) > reuse_swap_page() > -> delete_from_swapcache() > -> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE) > ...... > New charge is uncharged soon.... > To avoid this, move commit_charge() after page_mapcount() goes up to 1. > By this, > > try_charge() (usage += PAGESIZE) > reuse_swap_page() (may usage -= PAGESIZE if PCG_USED is set) > commit_charge() (If page_cgroup is not marked as PCG_USED, > add new charge.) > Accounting will be correct. > > Changelog (v2) -> (v3) > - fixed invalid charge to swp_entry==0. > - updated documentation. > Changelog (v1) -> (v2) > - fixed comment. > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > --- > Documentation/controllers/memcg_test.txt | 41 +++++++++++++++++++++++++++---- > mm/memcontrol.c | 7 +++-- > mm/memory.c | 11 ++++---- > 3 files changed, 46 insertions(+), 13 deletions(-) > > Index: mmotm-2.6.28-Dec12/mm/memory.c > =================================================================== > --- mmotm-2.6.28-Dec12.orig/mm/memory.c > +++ mmotm-2.6.28-Dec12/mm/memory.c > @@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct > * while the page is counted on swap but not yet in mapcount i.e. > * before page_add_anon_rmap() and swap_free(); try_to_free_swap() > * must be called after the swap_free(), or it will never succeed. > - * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry > - * in page->private, must be called before reuse_swap_page(), > - * which may delete_from_swap_cache(). > + * Because delete_from_swap_page() may be called by reuse_swap_page(), > + * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry > + * in page->private. In this case, a record in swap_cgroup is silently > + * discarded at swap_free(). > */ > > - mem_cgroup_commit_charge_swapin(page, ptr); > inc_mm_counter(mm, anon_rss); > pte = mk_pte(page, vma->vm_page_prot); > if (write_access && reuse_swap_page(page)) { > pte = maybe_mkwrite(pte_mkdirty(pte), vma); > write_access = 0; > } > - Removal of unassociated lines, not sure if that is a good practice. > flush_icache_page(vma, page); > set_pte_at(mm, address, page_table, pte); > page_add_anon_rmap(page, vma, address); > + /* It's better to call commit-charge after rmap is established */ > + mem_cgroup_commit_charge_swapin(page, ptr); > Yes, it does make sense > swap_free(entry); > if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) > Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt > =================================================================== > --- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt > +++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt > @@ -1,6 +1,6 @@ > Memory Resource Controller(Memcg) Implementation Memo. > -Last Updated: 2008/12/10 > -Base Kernel Version: based on 2.6.28-rc7-mm. > +Last Updated: 2008/12/15 > +Base Kernel Version: based on 2.6.28-rc8-mm. > > Because VM is getting complex (one of reasons is memcg...), memcg's behavior > is complex. This is a document for memcg's internal behavior. > @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI > (b) If the SwapCache has been mapped by processes, it has been > charged already. > > - In case (a), we charge it. In case (b), we don't charge it. > - (But racy state between (a) and (b) exists. We do check it.) > - At charging, a charge recorded in swap_cgroup is moved to page_cgroup. > + This swap-in is one of the most complicated work. In do_swap_page(), > + following events occur when pte is unchanged. > + > + (1) the page (SwapCache) is looked up. > + (2) lock_page() > + (3) try_charge_swapin() > + (4) reuse_swap_page() (may call delete_swap_cache()) > + (5) commit_charge_swapin() > + (6) swap_free(). > + > + Considering following situation for example. > + > + (A) The page has not been charged before (2) and reuse_swap_page() > + doesn't call delete_from_swap_cache(). > + (B) The page has not been charged before (2) and reuse_swap_page() > + calls delete_from_swap_cache(). > + (C) The page has been charged before (2) and reuse_swap_page() doesn't > + call delete_from_swap_cache(). > + (D) The page has been charged before (2) and reuse_swap_page() calls > + delete_from_swap_cache(). > + > + memory.usage/memsw.usage changes to this page/swp_entry will be > + Case (A) (B) (C) (D) > + Event > + Before (2) 0/ 1 0/ 1 1/ 1 1/ 1 > + =========================================== > + (3) +1/+1 +1/+1 +1/+1 +1/+1 > + (4) - 0/ 0 - -1/ 0 > + (5) 0/ 1 0/-1 -1/-1 0/ 0 > + (6) - - - 0/-1 > + =========================================== > + Result 1/ 1 1/1 1/ 1 1/ 1 > + > + In any cases, charges to this page should be 1/ 1. > The documentation patch failed to apply for me > 4.2 Swap-out. > At swap-out, typical state transition is below. > Index: mmotm-2.6.28-Dec12/mm/memcontrol.c > =================================================================== > --- mmotm-2.6.28-Dec12.orig/mm/memcontrol.c > +++ mmotm-2.6.28-Dec12/mm/memcontrol.c > @@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str > /* > * Now swap is on-memory. This means this page may be > * counted both as mem and swap....double count. > - * Fix it by uncharging from memsw. This SwapCache is stable > - * because we're still under lock_page(). > + * Fix it by uncharging from memsw. Basically, this SwapCache is stable > + * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page() > + * may call delete_from_swap_cache() before reach here. > */ > - if (do_swap_account) { > + if (do_swap_account && PageSwapCache(page)) { > swp_entry_t ent = {.val = page_private(page)}; > struct mem_cgroup *memcg; > memcg = swap_cgroup_record(ent, NULL); > > Looks good to me Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com> -- Balbir ----- End forwarded message ----- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3) 2008-12-15 8:37 ` Balbir Singh @ 2008-12-15 8:40 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-12-15 8:40 UTC (permalink / raw) To: balbir; +Cc: Hugh Dickins, Daisuke Nishimura, linux-mm, linux-kernel, akpm On Mon, 15 Dec 2008 14:07:26 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2008-12-15 16:07:51]: > > > > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > > > Fix swapin charge operation of memcg. > > > > Now, memcg has hooks to swap-out operation and checks SwapCache is really > > unused or not. That check depends on contents of struct page. > > I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as > > still-in-use. > > > > Now, reuse_swap_page() calles delete_from_swap_cache() before establishment > > of any rmap. Then, in followinig sequence > > > > (Page fault with WRITE) > > try_charge() (charge += PAGESIZE) > > commit_charge() (Check page_cgroup is used or not..) > > reuse_swap_page() > > -> delete_from_swapcache() > > -> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE) > > ...... > > New charge is uncharged soon.... > > To avoid this, move commit_charge() after page_mapcount() goes up to 1. > > By this, > > > > try_charge() (usage += PAGESIZE) > > reuse_swap_page() (may usage -= PAGESIZE if PCG_USED is set) > > commit_charge() (If page_cgroup is not marked as PCG_USED, > > add new charge.) > > Accounting will be correct. > > > > Changelog (v2) -> (v3) > > - fixed invalid charge to swp_entry==0. > > - updated documentation. > > Changelog (v1) -> (v2) > > - fixed comment. > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > --- > > Documentation/controllers/memcg_test.txt | 41 +++++++++++++++++++++++++++---- > > mm/memcontrol.c | 7 +++-- > > mm/memory.c | 11 ++++---- > > 3 files changed, 46 insertions(+), 13 deletions(-) > > > > Index: mmotm-2.6.28-Dec12/mm/memory.c > > =================================================================== > > --- mmotm-2.6.28-Dec12.orig/mm/memory.c > > +++ mmotm-2.6.28-Dec12/mm/memory.c > > @@ -2428,22 +2428,23 @@ static int do_swap_page(struct mm_struct > > * while the page is counted on swap but not yet in mapcount i.e. > > * before page_add_anon_rmap() and swap_free(); try_to_free_swap() > > * must be called after the swap_free(), or it will never succeed. > > - * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry > > - * in page->private, must be called before reuse_swap_page(), > > - * which may delete_from_swap_cache(). > > + * Because delete_from_swap_page() may be called by reuse_swap_page(), > > + * mem_cgroup_commit_charge_swapin() may not be able to find swp_entry > > + * in page->private. In this case, a record in swap_cgroup is silently > > + * discarded at swap_free(). > > */ > > > > - mem_cgroup_commit_charge_swapin(page, ptr); > > inc_mm_counter(mm, anon_rss); > > pte = mk_pte(page, vma->vm_page_prot); > > if (write_access && reuse_swap_page(page)) { > > pte = maybe_mkwrite(pte_mkdirty(pte), vma); > > write_access = 0; > > } > > - > > Removal of unassociated lines, not sure if that is a good practice. > my mistake... > > flush_icache_page(vma, page); > > set_pte_at(mm, address, page_table, pte); > > page_add_anon_rmap(page, vma, address); > > + /* It's better to call commit-charge after rmap is established */ > > + mem_cgroup_commit_charge_swapin(page, ptr); > > > > Yes, it does make sense > > > swap_free(entry); > > if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) > > Index: mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt > > =================================================================== > > --- mmotm-2.6.28-Dec12.orig/Documentation/controllers/memcg_test.txt > > +++ mmotm-2.6.28-Dec12/Documentation/controllers/memcg_test.txt > > @@ -1,6 +1,6 @@ > > Memory Resource Controller(Memcg) Implementation Memo. > > -Last Updated: 2008/12/10 > > -Base Kernel Version: based on 2.6.28-rc7-mm. > > +Last Updated: 2008/12/15 > > +Base Kernel Version: based on 2.6.28-rc8-mm. > > > > Because VM is getting complex (one of reasons is memcg...), memcg's behavior > > is complex. This is a document for memcg's internal behavior. > > @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI > > (b) If the SwapCache has been mapped by processes, it has been > > charged already. > > > > - In case (a), we charge it. In case (b), we don't charge it. > > - (But racy state between (a) and (b) exists. We do check it.) > > - At charging, a charge recorded in swap_cgroup is moved to page_cgroup. > > + This swap-in is one of the most complicated work. In do_swap_page(), > > + following events occur when pte is unchanged. > > + > > + (1) the page (SwapCache) is looked up. > > + (2) lock_page() > > + (3) try_charge_swapin() > > + (4) reuse_swap_page() (may call delete_swap_cache()) > > + (5) commit_charge_swapin() > > + (6) swap_free(). > > + > > + Considering following situation for example. > > + > > + (A) The page has not been charged before (2) and reuse_swap_page() > > + doesn't call delete_from_swap_cache(). > > + (B) The page has not been charged before (2) and reuse_swap_page() > > + calls delete_from_swap_cache(). > > + (C) The page has been charged before (2) and reuse_swap_page() doesn't > > + call delete_from_swap_cache(). > > + (D) The page has been charged before (2) and reuse_swap_page() calls > > + delete_from_swap_cache(). > > + > > + memory.usage/memsw.usage changes to this page/swp_entry will be > > + Case (A) (B) (C) (D) > > + Event > > + Before (2) 0/ 1 0/ 1 1/ 1 1/ 1 > > + =========================================== > > + (3) +1/+1 +1/+1 +1/+1 +1/+1 > > + (4) - 0/ 0 - -1/ 0 > > + (5) 0/ 1 0/-1 -1/-1 0/ 0 > > + (6) - - - 0/-1 > > + =========================================== > > + Result 1/ 1 1/1 1/ 1 1/ 1 > > + > > + In any cases, charges to this page should be 1/ 1. > > > > The documentation patch failed to apply for me > Hmm... I'll check my queue again. Thanks, -Kame > > 4.2 Swap-out. > > At swap-out, typical state transition is below. > > Index: mmotm-2.6.28-Dec12/mm/memcontrol.c > > =================================================================== > > --- mmotm-2.6.28-Dec12.orig/mm/memcontrol.c > > +++ mmotm-2.6.28-Dec12/mm/memcontrol.c > > @@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str > > /* > > * Now swap is on-memory. This means this page may be > > * counted both as mem and swap....double count. > > - * Fix it by uncharging from memsw. This SwapCache is stable > > - * because we're still under lock_page(). > > + * Fix it by uncharging from memsw. Basically, this SwapCache is stable > > + * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page() > > + * may call delete_from_swap_cache() before reach here. > > */ > > - if (do_swap_account) { > > + if (do_swap_account && PageSwapCache(page)) { > > swp_entry_t ent = {.val = page_private(page)}; > > struct mem_cgroup *memcg; > > memcg = swap_cgroup_record(ent, NULL); > > > > > > > Looks good to me > > Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> > Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com> > > -- > Balbir > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3) 2008-12-15 7:07 ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3) KAMEZAWA Hiroyuki 2008-12-15 8:37 ` Balbir Singh @ 2008-12-15 10:34 ` Hugh Dickins 2008-12-16 4:02 ` [PATCH mmotm] memcg: fix for documentation (Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)) Daisuke Nishimura 2 siblings, 0 replies; 13+ messages in thread From: Hugh Dickins @ 2008-12-15 10:34 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: Daisuke Nishimura, linux-mm, linux-kernel, balbir, akpm On Mon, 15 Dec 2008, KAMEZAWA Hiroyuki wrote: > > Fix swapin charge operation of memcg. > > @@ -1139,10 +1139,11 @@ void mem_cgroup_commit_charge_swapin(str > /* > * Now swap is on-memory. This means this page may be > * counted both as mem and swap....double count. > - * Fix it by uncharging from memsw. This SwapCache is stable > - * because we're still under lock_page(). > + * Fix it by uncharging from memsw. Basically, this SwapCache is stable > + * under lock_page(). But in do_swap_page()::memory.c, reuse_swap_page() > + * may call delete_from_swap_cache() before reach here. > */ > - if (do_swap_account) { > + if (do_swap_account && PageSwapCache(page)) { > swp_entry_t ent = {.val = page_private(page)}; > struct mem_cgroup *memcg; > memcg = swap_cgroup_record(ent, NULL); Yes, that addition looks good to me. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH mmotm] memcg: fix for documentation (Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)) 2008-12-15 7:07 ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3) KAMEZAWA Hiroyuki 2008-12-15 8:37 ` Balbir Singh 2008-12-15 10:34 ` Hugh Dickins @ 2008-12-16 4:02 ` Daisuke Nishimura 2008-12-16 4:53 ` KAMEZAWA Hiroyuki 2 siblings, 1 reply; 13+ messages in thread From: Daisuke Nishimura @ 2008-12-16 4:02 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Hugh Dickins, linux-mm, linux-kernel, balbir, akpm, nishimura Sorry for late reply. > @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI > (b) If the SwapCache has been mapped by processes, it has been > charged already. > > - In case (a), we charge it. In case (b), we don't charge it. > - (But racy state between (a) and (b) exists. We do check it.) > - At charging, a charge recorded in swap_cgroup is moved to page_cgroup. > + This swap-in is one of the most complicated work. In do_swap_page(), > + following events occur when pte is unchanged. > + > + (1) the page (SwapCache) is looked up. > + (2) lock_page() > + (3) try_charge_swapin() > + (4) reuse_swap_page() (may call delete_swap_cache()) > + (5) commit_charge_swapin() > + (6) swap_free(). > + > + Considering following situation for example. > + > + (A) The page has not been charged before (2) and reuse_swap_page() > + doesn't call delete_from_swap_cache(). > + (B) The page has not been charged before (2) and reuse_swap_page() > + calls delete_from_swap_cache(). > + (C) The page has been charged before (2) and reuse_swap_page() doesn't > + call delete_from_swap_cache(). > + (D) The page has been charged before (2) and reuse_swap_page() calls > + delete_from_swap_cache(). > + > + memory.usage/memsw.usage changes to this page/swp_entry will be > + Case (A) (B) (C) (D) > + Event > + Before (2) 0/ 1 0/ 1 1/ 1 1/ 1 > + =========================================== > + (3) +1/+1 +1/+1 +1/+1 +1/+1 > + (4) - 0/ 0 - -1/ 0 > + (5) 0/ 1 0/-1 -1/-1 0/ 0 > + (6) - - - 0/-1 > + =========================================== > + Result 1/ 1 1/1 1/ 1 1/ 1 > + > + In any cases, charges to this page should be 1/ 1. > I've verified that charges will result in valid values by tracing source code in all of these cases, but in case of (B) I don't think commit_charge_swapin does memsw-- because PageSwapCache has been cleared already. swap_free does memsw-- in this case. I attached a fix patch. Thanks, Daisuke Nishimura. === From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> fix for documentation. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> --- Documentation/controllers/memcg_test.txt | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/controllers/memcg_test.txt b/Documentation/controllers/memcg_test.txt index 3c1458a..08d4d3e 100644 --- a/Documentation/controllers/memcg_test.txt +++ b/Documentation/controllers/memcg_test.txt @@ -139,10 +139,10 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. =========================================== (3) +1/+1 +1/+1 +1/+1 +1/+1 (4) - 0/ 0 - -1/ 0 - (5) 0/ 1 0/-1 -1/-1 0/ 0 - (6) - - - 0/-1 + (5) 0/-1 0/ 0 -1/-1 0/ 0 + (6) - 0/-1 - 0/-1 =========================================== - Result 1/ 1 1/1 1/ 1 1/ 1 + Result 1/ 1 1/ 1 1/ 1 1/ 1 In any cases, charges to this page should be 1/ 1. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH mmotm] memcg: fix for documentation (Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)) 2008-12-16 4:02 ` [PATCH mmotm] memcg: fix for documentation (Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)) Daisuke Nishimura @ 2008-12-16 4:53 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-12-16 4:53 UTC (permalink / raw) To: Daisuke Nishimura; +Cc: Hugh Dickins, linux-mm, linux-kernel, balbir, akpm On Tue, 16 Dec 2008 13:02:30 +0900 Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote: > Sorry for late reply. > > > @@ -111,9 +111,40 @@ Under below explanation, we assume CONFI > > (b) If the SwapCache has been mapped by processes, it has been > > charged already. > > > > - In case (a), we charge it. In case (b), we don't charge it. > > - (But racy state between (a) and (b) exists. We do check it.) > > - At charging, a charge recorded in swap_cgroup is moved to page_cgroup. > > + This swap-in is one of the most complicated work. In do_swap_page(), > > + following events occur when pte is unchanged. > > + > > + (1) the page (SwapCache) is looked up. > > + (2) lock_page() > > + (3) try_charge_swapin() > > + (4) reuse_swap_page() (may call delete_swap_cache()) > > + (5) commit_charge_swapin() > > + (6) swap_free(). > > + > > + Considering following situation for example. > > + > > + (A) The page has not been charged before (2) and reuse_swap_page() > > + doesn't call delete_from_swap_cache(). > > + (B) The page has not been charged before (2) and reuse_swap_page() > > + calls delete_from_swap_cache(). > > + (C) The page has been charged before (2) and reuse_swap_page() doesn't > > + call delete_from_swap_cache(). > > + (D) The page has been charged before (2) and reuse_swap_page() calls > > + delete_from_swap_cache(). > > + > > + memory.usage/memsw.usage changes to this page/swp_entry will be > > + Case (A) (B) (C) (D) > > + Event > > + Before (2) 0/ 1 0/ 1 1/ 1 1/ 1 > > + =========================================== > > + (3) +1/+1 +1/+1 +1/+1 +1/+1 > > + (4) - 0/ 0 - -1/ 0 > > + (5) 0/ 1 0/-1 -1/-1 0/ 0 > > + (6) - - - 0/-1 > > + =========================================== > > + Result 1/ 1 1/1 1/ 1 1/ 1 > > + > > + In any cases, charges to this page should be 1/ 1. > > > I've verified that charges will result in valid values by tracing source code > in all of these cases, but in case of (B) I don't think commit_charge_swapin > does memsw-- because PageSwapCache has been cleared already. swap_free does > memsw-- in this case. > > I attached a fix patch. > you're right. it comes from this version's new change....sorry. Acked-by; KAMEZAWA Hiroyuki <kamezaw.hiroyu@jp.fujitsu.com> > Thanks, > Daisuke Nishimura. > > === > From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> > > fix for documentation. > > Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> > --- > Documentation/controllers/memcg_test.txt | 6 +++--- > 1 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/Documentation/controllers/memcg_test.txt b/Documentation/controllers/memcg_test.txt > index 3c1458a..08d4d3e 100644 > --- a/Documentation/controllers/memcg_test.txt > +++ b/Documentation/controllers/memcg_test.txt > @@ -139,10 +139,10 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. > =========================================== > (3) +1/+1 +1/+1 +1/+1 +1/+1 > (4) - 0/ 0 - -1/ 0 > - (5) 0/ 1 0/-1 -1/-1 0/ 0 > - (6) - - - 0/-1 > + (5) 0/-1 0/ 0 -1/-1 0/ 0 > + (6) - 0/-1 - 0/-1 > =========================================== > - Result 1/ 1 1/1 1/ 1 1/ 1 > + Result 1/ 1 1/ 1 1/ 1 1/ 1 > > In any cases, charges to this page should be 1/ 1. > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) 2008-12-13 9:49 ` Hugh Dickins 2008-12-13 10:27 ` KAMEZAWA Hiroyuki @ 2008-12-13 10:38 ` KAMEZAWA Hiroyuki 1 sibling, 0 replies; 13+ messages in thread From: KAMEZAWA Hiroyuki @ 2008-12-13 10:38 UTC (permalink / raw) To: Hugh Dickins Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, linux-mm, linux-kernel, balbir, akpm Hugh Dickins said: > On Sat, 13 Dec 2008, KAMEZAWA Hiroyuki wrote: >> --- mmotm-2.6.28-Dec12.orig/mm/memory.c >> +++ mmotm-2.6.28-Dec12/mm/memory.c >> >> - mem_cgroup_commit_charge_swapin(page, ptr); >> inc_mm_counter(mm, anon_rss); >> pte = mk_pte(page, vma->vm_page_prot); >> if (write_access && reuse_swap_page(page)) { >> pte = maybe_mkwrite(pte_mkdirty(pte), vma); >> write_access = 0; >> } >> - >> flush_icache_page(vma, page); >> set_pte_at(mm, address, page_table, pte); >> page_add_anon_rmap(page, vma, address); >> + /* It's better to call commit-charge after rmap is established */ >> + mem_cgroup_commit_charge_swapin(page, ptr); >> >> swap_free(entry); >> if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || >> PageMlocked(page)) > > That ordering is back to how it was before I adjusted it > for reuse_swap_page()'s delete_from_swap_cache(), isn't it? > > So I don't understand how you've fixed the bug I hit (not an > accounting imbalance but an oops or BUG, I forget) with this > ordering, without making some other change elsewhere. > Ah, this is a fix for the new bug by this order. == try_charge() commit_charge() reuse_swap_page() -> delete_from_swapcache() -> uncharge_swapcache(). increase mapcount here. == Because ucharge_swapcache() assumes following a. if mapcount==0, this swap cache is of no use and will be discarded. b. if mapcount >0, this swap cache is in use. A charge commited by commit_charge() is discarded by reuse_swap_page(). By delaying commit (means checking flag of page_cgroup). == try_charge() reuse_swap_page() commit_charge() == the leak of charge doesn't happen. (reuse_swap_page() may drop page from swap-cache, but it's no probelm to commit. But as you say, this has swp_entry==0 bug.) > mem_cgroup_commit_charge_swapin calls swap_cgroup_record with > bogus swp_entry_t 0, which appears to belong to swp_offset 0 of > swp_type 0, but the ctrl->map for type 0 may have been freed > ages ago (we do always start from 0, but maybe we swapped on > type 1 and swapped off type 0 meanwhile). I'm guessing that > by looking at the code, not by retesting it, so I may have the > details wrong; but I didn't reorder your code just for fun. > > Perhaps your restored ordering works if you check PageSwapCache > in mem_cgroup_commit_charge_swapin or check 0 in swap_cgroup_record, > but I don't see that in yesterday's mmotm, nor in this patch. > Ahhhh, sorry. ok, swp_entry==0 is valid...Sigh... I'll revisit this and check how commit_charge() works. I think checking PageSwapCache() is enough but if not, do somehing other. (Maybe Nishimura's suggestion to pass swp_entry directly to commit_charge() is one way.) > (And I should admit, I've not even attempted to follow your > accounting justification: I'll leave that to you memcg guys.) > Sorry for complication ;( > An alternative could be not to clear page->private when deleting > from swap cache, that's only done for tidiness and to force notice > of races like this; but I'd want a much stronger reason to change that. > It seems that it will add another complex or unexpected behavior.. I think I can do something workaround. > Or am I making this up? As I say, I've not tested it this time around. > I'll ask you if I found I can't do anything ;( Thank you for pointing out! I'll revisit this on Monday. -Kame ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-12-16 4:52 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-12-12 8:29 [BUGFIX][PATCH mmotm] memcg fix swap accounting leak KAMEZAWA Hiroyuki 2008-12-12 9:43 ` Daisuke Nishimura 2008-12-12 11:16 ` KAMEZAWA Hiroyuki 2008-12-13 7:03 ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) KAMEZAWA Hiroyuki 2008-12-13 9:49 ` Hugh Dickins 2008-12-13 10:27 ` KAMEZAWA Hiroyuki 2008-12-15 7:07 ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3) KAMEZAWA Hiroyuki 2008-12-15 8:37 ` Balbir Singh 2008-12-15 8:40 ` KAMEZAWA Hiroyuki 2008-12-15 10:34 ` Hugh Dickins 2008-12-16 4:02 ` [PATCH mmotm] memcg: fix for documentation (Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v3)) Daisuke Nishimura 2008-12-16 4:53 ` KAMEZAWA Hiroyuki 2008-12-13 10:38 ` [BUGFIX][PATCH mmotm] memcg fix swap accounting leak (v2) KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox