From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-gg0-f174.google.com (mail-gg0-f174.google.com [209.85.161.174]) by kanga.kvack.org (Postfix) with ESMTP id 5D2B86B0035 for ; Thu, 26 Dec 2013 00:51:12 -0500 (EST) Received: by mail-gg0-f174.google.com with SMTP id v2so1650387ggc.19 for ; Wed, 25 Dec 2013 21:51:12 -0800 (PST) Received: from mail-ie0-x231.google.com (mail-ie0-x231.google.com [2607:f8b0:4001:c03::231]) by mx.google.com with ESMTPS id z48si27306200yha.231.2013.12.25.21.51.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 25 Dec 2013 21:51:11 -0800 (PST) Received: by mail-ie0-f177.google.com with SMTP id tp5so7955061ieb.8 for ; Wed, 25 Dec 2013 21:51:10 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <52BB847F.5080600@oracle.com> References: <52B1C143.8080301@oracle.com> <52B871B2.7040409@oracle.com> <20131224025127.GA2835@lge.com> <52B8F8F6.1080500@oracle.com> <20131224060705.GA16140@lge.com> <20131224074546.GB27156@lge.com> <52BB847F.5080600@oracle.com> Date: Thu, 26 Dec 2013 14:51:10 +0900 Message-ID: Subject: Re: mm: kernel BUG at include/linux/swapops.h:131! From: Konstantin Khlebnikov Content-Type: multipart/alternative; boundary=14dae934088b874d3c04ee6990d2 Sender: owner-linux-mm@kvack.org List-ID: To: Bob Liu Cc: linux-mm@kvack.org, Linux Kernel Mailing List , Joonsoo Kim , Andrew Morton , Sasha Levin , Wanpeng Li --14dae934088b874d3c04ee6990d2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hmm. This kind of race looks impossible: dup_mmap() always places child's vma in into rmap tree after parent's one. For file-vma it's done explicitly (vma_interval_tree_insert_after), for anon vma it's true because rb-tree insert function goes to right branch if elements are equal. Thus remove_migration_ptes() sees parent's pte first: If child has the copy this function will check it after that. And they are already synchronized with parent's and child's pte locks.=EF= =BB=BF On Dec 26, 2013 10:21 AM, "Bob Liu" wrote: > > On 12/24/2013 03:45 PM, Joonsoo Kim wrote: > > On Tue, Dec 24, 2013 at 03:07:05PM +0900, Joonsoo Kim wrote: > >> On Mon, Dec 23, 2013 at 10:01:10PM -0500, Sasha Levin wrote: > >>> On 12/23/2013 09:51 PM, Joonsoo Kim wrote: > >>>> On Mon, Dec 23, 2013 at 12:24:02PM -0500, Sasha Levin wrote: > >>>>>> Ping? > >>>>>> > >>>>>> I've also Cc'ed the "this page shouldn't be locked at all" team. > >>>> Hello, > >>>> > >>>> I can't find the reason of this problem. > >>>> If it is reproducible, how about bisecting? > >>> > >>> While it reproduces under fuzzing it's pretty hard to bisect it with > >>> the amount of issues uncovered by trinity recently. > >>> > >>> I can add any debug code to the site of the BUG if that helps. > >> > >> Good! > >> It will be helpful to add dump_page() in migration_entry_to_page(). > >> > >> Thanks. > >> > > > > Minchan teaches me that there is possible race condition between > > fork and migration. > > > > Please consider following situation. > > > > > > Process A (do migration) Process B (parents) Process C (child) > > > > try_to_unmap() for migration fork > > setup migration entry to B's vma > > ... > > try_to_unmap() for migration > > move_to_new_page() > > > > link new vma > > into interval tree > > remove_migration_ptes() > > check and clear migration entry on C's vma > > ... copy_one_pte: > > ... now, B and C have migration entry > > ... > > ... > > check and clear migration entry on B's vma > > ... > > ... > > remove_migration_ptes() > > > > > > Eventually, migration entry on C's vma is left. > > And then, when C exits, above BUG_ON() can be triggered. > > > > Yes, Looks like this is a potential race condition. > > > I'm not sure the I am right, so please think of it together. :) > > And I'm not sure again that above assumption is related to this trigger report, > > since this may exist for a long time. > > > > So my question to mm folks is is above assumption possible and do we have > > any protection mechanism on this race? > > > > I think we can down_read(&mm->mmap_sem) before remove_migration_ptes() > to fix this issue, but I don't have time to verify it currently. > > -- > Regards, > -Bob > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org --14dae934088b874d3c04ee6990d2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Hmm. This kind of race looks impossible: dup_mmap() always p= laces child's
vma in into rmap tree after parent's one. For file-vma it's done ex= plicitly
(vma_interval_tree_insert_after), for anon vma it's true because rb-tre= e
insert function goes to right branch if elements are equal.

Thus remove_migration_ptes() sees parent's pte first: If child has the copy this function will check it after that.
And they are already synchronized with parent's and child's pte loc= ks.=EF=BB=BF

On Dec 26, 2013 10:21 AM, "Bob Liu" <bob.liu@oracle.com> wrote:
>
> On 12/24/2013 03:45 PM, Joonsoo Kim wrote:
> > On Tue, Dec 24, 2013 at 03:07:05PM +0900, Joonsoo Kim wrote:
> >> On Mon, Dec 23, 2013 at 10:01:10PM -0500, Sasha Levin wrote:<= br> > >>> On 12/23/2013 09:51 PM, Joonsoo Kim wrote:
> >>>> On Mon, Dec 23, 2013 at 12:24:02PM -0500, Sasha Levin= wrote:
> >>>>>> Ping?
> >>>>>>
> >>>>>> I've also Cc'ed the "this page s= houldn't be locked at all" team.
> >>>> Hello,
> >>>>
> >>>> I can't find the reason of this problem.
> >>>> If it is reproducible, how about bisecting?
> >>>
> >>> While it reproduces under fuzzing it's pretty hard to= bisect it with
> >>> the amount of issues uncovered by trinity recently.
> >>>
> >>> I can add any debug code to the site of the BUG if that h= elps.
> >>
> >> Good!
> >> It will be helpful to add dump_page() in migration_entry_to_p= age().
> >>
> >> Thanks.
> >>
> >
> > Minchan teaches me that there is possible race condition between<= br> > > fork and migration.
> >
> > Please consider following situation.
> >
> >
> > Process A (do migration) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Process B (parents) Process C (child)=
> >
> > try_to_unmap() for migration <begin> =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0fork
> > setup migration entry to B's vma
> > ...
> > try_to_unmap() for migration <end>
> > move_to_new_page()
> >
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 link new vma
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 into interval tree
> > remove_migration_ptes() <begin>
> > check and clear migration entry on C's vma
> > ... =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 copy_one_pte:
> > ... =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 now, B and C have migration entry
> > ...
> > ...
> > check and clear migration entry on B's vma
> > ...
> > ...
> > remove_migration_ptes() <end>
> >
> >
> > Eventually, migration entry on C's vma is left.
> > And then, when C exits, above BUG_ON() can be triggered.
> >
>
> Yes, Looks like this is a potential race condition.
>
> > I'm not sure the I am right, so please think of it together. = :)
> > And I'm not sure again that above assumption is related to th= is trigger report,
> > since this may exist for a long time.
> >
> > So my question to mm folks is is above assumption possible and do= we have
> > any protection mechanism on this race?
> >
>
> I think we can down_read(&mm->mmap_sem) before remove_migration= _ptes()
> to fix this issue, but I don't have time to verify it currently. >
> --
> Regards,
> -Bob
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in<= br> > the body to majordomo@kvack.org= . =C2=A0For more info on Linux MM,
> see: http://www.linux-mm.org/= .
> Don't email: <a href=3Dmailto:"dont@kvack.org"> emai= l@kvack.org </a>

--14dae934088b874d3c04ee6990d2-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org