linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>
Subject: Re: [RFC][PATCH 8/10] memcg: clean up charge/uncharge anon
Date: Tue, 29 Sep 2009 10:26:53 +0900	[thread overview]
Message-ID: <20090929102653.612cc2a4.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20090929092413.9526de0b.nishimura@mxp.nes.nec.co.jp>


Thank you for testing.

On Tue, 29 Sep 2009 09:24:13 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Fri, 25 Sep 2009 17:28:50 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > This may need careful review.
> > 
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > In old codes, this function was used for other purposes rather
> > than charginc new anon pages. But now, this function is (ranamed) and
> > used only for new pages.
> > 
> > For the same kind of reason, ucharge_page() should use VM_BUG_ON().
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/memcontrol.c |   27 +++++++++++++--------------
> >  1 file changed, 13 insertions(+), 14 deletions(-)
> > 
> > Index: temp-mmotm/mm/memcontrol.c
> > ===================================================================
> > --- temp-mmotm.orig/mm/memcontrol.c
> > +++ temp-mmotm/mm/memcontrol.c
> > @@ -1638,15 +1638,8 @@ int mem_cgroup_newpage_charge(struct pag
> >  		return 0;
> >  	if (PageCompound(page))
> >  		return 0;
> > -	/*
> > -	 * If already mapped, we don't have to account.
> > -	 * If page cache, page->mapping has address_space.
> > -	 * But page->mapping may have out-of-use anon_vma pointer,
> > -	 * detecit it by PageAnon() check. newly-mapped-anon's page->mapping
> > -	 * is NULL.
> > -  	 */
> > -	if (page_mapped(page) || (page->mapping && !PageAnon(page)))
> > -		return 0;
> > +	/* This function is "newpage_charge" and called right after alloc */
> > +	VM_BUG_ON(page_mapped(page) || (page->mapping && !PageAnon(page)));
> >  	if (unlikely(!mm))
> >  		mm = &init_mm;
> >  	return mem_cgroup_charge_common(page, mm, gfp_mask,
> I think this VM_BUG_ON() is vaild.
> 
> > @@ -1901,11 +1894,11 @@ unlock_out:
> >  
> >  void mem_cgroup_uncharge_page(struct page *page)
> >  {
> > -	/* early check. */
> > -	if (page_mapped(page))
> > -		return;
> > -	if (page->mapping && !PageAnon(page))
> > -		return;
> > +	/*
> > + 	 * Called when anonymous page's page->mapcount goes down to zero,
> > + 	 * or cancel a charge gotten by newpage_charge().
> > +	 */
> > +	VM_BUG_ON(page_mapped(page) || (page->mapping && !PageAnon(page)));
> >  	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_MAPPED);
> >  }
> >  
> > 
> I hit this VM_BUG_ON() when I'm testing mmotm-2009-09-25-14-35 + these patches
> + my latest recharge-at-task-move patches(not posted yet).
> 

Ouch.


> [ 2966.031815] kernel BUG at mm/memcontrol.c:2014!
> [ 2966.031815] invalid opcode: 0000 [#1] SMP
> [ 2966.031815] last sysfs file: /sys/devices/pci0000:00/0000:00:07.0/0000:09:00.2/0000:0b:
> 04.1/irq
> [ 2966.031815] CPU 2
> [ 2966.031815] Modules linked in: autofs4 lockd sunrpc iscsi_tcp libiscsi_tcp libiscsi scs
> i_transport_iscsi dm_mirror dm_multipath video output lp sg ide_cd_mod cdrom serio_raw but
> ton parport_pc parport e1000 i2c_i801 i2c_core pata_acpi ata_generic pcspkr dm_region_hash
>  dm_log dm_mod ata_piix libata shpchp megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd u
> hci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
> [ 2966.031815] Pid: 19677, comm: mmapstress10 Not tainted 2.6.31-mmotm-2009-09-25-14-35-35
> ace741 #1 Express5800/140Rd-4 [N8100-1065]
> [ 2966.031815] RIP: 0010:[<ffffffff800efbe2>]  [<ffffffff800efbe2>] mem_cgroup_uncharge_pa
> ge+0x18/0x28
> [ 2966.031815] RSP: 0000:ffff880381713da8  EFLAGS: 00010246
> [ 2966.031815] RAX: 0000000000000000 RBX: ffffea001668fef0 RCX: ffffea001763afe0
> [ 2966.031815] RDX: 0000000000000080 RSI: 0000000000000008 RDI: ffffea001668fef0
> [ 2966.031815] RBP: ffff880381713da8 R08: ffffea001763afe0 R09: ffff88039dfafa28
> [ 2966.031815] R10: ffff880381617b80 R11: 0000000000000000 R12: ffffea001668fef0
> [ 2966.031815] R13: ffffea001668fef0 R14: 0000000000000008 R15: ffff880381617b80
> [ 2966.031815] FS:  00007f9a617fa6e0(0000) GS:ffff88002a000000(0000) knlGS:000000000000000
> 0
> [ 2966.031815] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 2966.031815] CR2: 000000385021c4b8 CR3: 0000000372107000 CR4: 00000000000006e0
> [ 2966.031815] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2966.031815] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 2966.031815] Process mmapstress10 (pid: 19677, threadinfo ffff880381712000, task ffff880
> 3817c2920)
> [ 2966.031815] Stack:
> [ 2966.031815]  ffff880381713dc8 ffffffff800d6bff 00000003992ec067 00000003992ec067
> [ 2966.031815] <0> ffff880381713e58 ffffffff800ce2f3 ffff880381713e18 ffff88038a90c408
> [ 2966.031815] <0> 000000385021c4b8 ffff88039dfaf6c0 ffffea000168fef0 ffffea0016c91508
> [ 2966.031815] Call Trace:
> [ 2966.031815]  [<ffffffff800d6bff>] page_remove_rmap+0x28/0x44
> [ 2966.031815]  [<ffffffff800ce2f3>] do_wp_page+0x626/0x73c
> [ 2966.031815]  [<ffffffff8005e9e3>] ? __wake_up_bit+0x2c/0x2e
> [ 2966.031815]  [<ffffffff800cfbfc>] handle_mm_fault+0x712/0x824
> [ 2966.031815]  [<ffffffff8034e6d7>] do_page_fault+0x255/0x2e5
> [ 2966.031815]  [<ffffffff8034c59f>] page_fault+0x1f/0x30
> [ 2966.031815] Code: 83 7f 18 00 74 04 0f 0b eb fe 31 f6 e8 75 fe ff ff c9 c3 8b 47 0c 55
> 48 89 e5 85 c0 79 0d 48 8b 47 18 48 85 c0 74 08 a8 01 75 04 <0f> 0b eb fe be 01 00 00 00 e
> 8 4d fe ff ff c9 c3 55 65 48 8b 04
> [ 2966.031815] RIP  [<ffffffff800efbe2>] mem_cgroup_uncharge_page+0x18/0x28
> [ 2966.031815]  RSP <ffff880381713da8>
> 
> I don't think my patch is the guilt because this also happens even if I don't
> set "recharge_at_immigrate".
> 
> It might be better that I test w/o my patches, but IIUC, this can happen
> in following scenario like:
> 
> Assume process A and B has the same swap pte.
> 
>           process A                       process B
>   do_swap_page()                    do_swap_page()
>     read_swap_cache_async()
>     lock_page()                       lookup_swap_cache()
>     page_add_anon_rmap()
>     unlock_page()
>                                       lock_page()
>     do_wp_page()
I think do_wp_page() do lock_page() here.
Because the page is ANON.

>       page_remove_rmap()
>         atomic_add_negative()
>           -> page->_mapcount = -1     page_add_anon_rmap()
>                                         atomic_inc_and_test()
>                                           -> page->_mapcount = 0
>         mem_cgroup_uncharge_page()
>           -> hit the VM_BUG_ON()
> 
> 
> So, I think this should be like:
> 
> void mem_cgroup_uncharge_page(struct page *page)
> {
> 	/*
> 	 * Called when anonymous page's page->mapcount goes down to zero,
> 	 * or cancel a charge gotten by newpage_charge().
> 	 * But there is a small race between page_remove_rmap() and
> 	 * page_add_anon_rmap(), so we can reach here with page_mapped().
> 	 */
> 	VM_BUG_ON(page->mapping && !PageAnon(page))
>         if (unlikely(page_mapped(page))
> 		return;
> 	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_MAPPED);
> }
> 

>From backtrace, mem_cgroup_uncharge_page() is called via page_remove_rmap().
Then, page_mapped() should be false.

Hmm.
 VM_BUG_ON(page_mapped(page) || (page->mapping && !PageAnon(page)));

mem_cgroup_uncharge_page() is called by page_remove_rmap() only when
 !page_mapped() and PageAnon(page) == true.
Could you show me disassemble of (head lines of) mem_cgroup_uncharge_page() ?

Maybe there is something I don't understand..
IIUC, when page_remove_rmap() is called by do_wp_page(),
there must be pte(s) which points to the page and a pte is guarded by
page table lock. So, I think page_mapcount() > 0 before calling page_remove_rmap()
because there must be a valid pte, at least.

Can this scenario happen ?
==
    Thread A.                                      Thread B.

    do_wp_page()                                 do_swap_page()
       PageAnon(oldpage)                         
         lock_page()                             lock_page()=> wait.
         reuse = false.
         unlock_page()                           get lock.      
       do copy-on-write
       pte_same() == true
         page_remove_rmap(oldpage) (mapcount goes to -1)
                                                 page_set_anon_rmap() (new anon rmap again)
==
Then, oldpage's mapcount goes down to 0 and up to 1 immediately.

About charge itself, Thread B. executes swap's charge sequence and maybe
no problem. Hmm.

I'll apply your change. Thank you for report.

Regards,
-Kame



> 
> 
> Thanks,
> Daisuke Nishimura.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-09-29  1:17 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-25  8:17 [RFC][PATCH 0/10] memcg clean up and some fixes for softlimit (Sep25) KAMEZAWA Hiroyuki
2009-09-25  8:18 ` [RFC][PATCH 1/10] memcg : modifications for softlimit uncharge KAMEZAWA Hiroyuki
2009-09-25  8:20 ` [RFC][PATCH 2/10] memcg : clean up in softlimit charge KAMEZAWA Hiroyuki
2009-09-25  8:21 ` [RFC][PATCH 3/10] memcg: reorganize memcontrol.c KAMEZAWA Hiroyuki
2009-09-25  8:22 ` [RFC][PATCH 4/10] memcg: add memcg charge cancel KAMEZAWA Hiroyuki
2009-09-25  8:24 ` [RFC][PATCH 5/10] memcg: clean up percpu statistics access KAMEZAWA Hiroyuki
2009-09-25  8:25 ` [RFC][PATCH 6/10] memcg: remove unsued macros KAMEZAWA Hiroyuki
2009-09-25  8:25 ` [RFC][PATCH 0/10] memcg clean up and some fixes for softlimit (Sep25) Balbir Singh
2009-09-25  8:27 ` [RFC][PATCH 7/10] memcg: replace cont with cgroup KAMEZAWA Hiroyuki
2009-09-25  8:28 ` [RFC][PATCH 8/10] memcg: clean up charge/uncharge anon KAMEZAWA Hiroyuki
2009-09-29  0:24   ` Daisuke Nishimura
2009-09-29  1:26     ` KAMEZAWA Hiroyuki [this message]
2009-09-29  2:18       ` Daisuke Nishimura
2009-09-29  3:03         ` Daisuke Nishimura
2009-09-29  3:14           ` KAMEZAWA Hiroyuki
2009-09-25  8:29 ` [RFC][PATCH 9/10] memcg: clean up perzone stat KAMEZAWA Hiroyuki
2009-09-25  8:30 ` [RFC][PATCH 10/10] memcg: add commentary KAMEZAWA Hiroyuki
2009-09-30  2:21   ` Daisuke Nishimura
2009-09-30  2:41     ` KAMEZAWA Hiroyuki
2009-09-30  4:36       ` Daisuke Nishimura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090929102653.612cc2a4.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=nishimura@mxp.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox