From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua0-f197.google.com (mail-ua0-f197.google.com [209.85.217.197]) by kanga.kvack.org (Postfix) with ESMTP id 888046B028B for ; Fri, 28 Oct 2016 18:25:17 -0400 (EDT) Received: by mail-ua0-f197.google.com with SMTP id 51so47577887uai.3 for ; Fri, 28 Oct 2016 15:25:17 -0700 (PDT) Received: from mail-qt0-x243.google.com (mail-qt0-x243.google.com. [2607:f8b0:400d:c0d::243]) by mx.google.com with ESMTPS id 92si7375849uaw.220.2016.10.28.15.25.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 28 Oct 2016 15:25:16 -0700 (PDT) Received: by mail-qt0-x243.google.com with SMTP id 23so2465963qtp.2 for ; Fri, 28 Oct 2016 15:25:16 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20161028145215.87fd39d8f8822a2cd11b621c@linux-foundation.org> References: <20161028145215.87fd39d8f8822a2cd11b621c@linux-foundation.org> From: Joseph Yasi Date: Fri, 28 Oct 2016 18:25:15 -0400 Message-ID: Subject: Re: [Bug 180101] New: BUG: unable to handle kernel paging request at x with "mm: remove gup_flags FOLL_WRITE games from __get_user_pages()" Content-Type: multipart/alternative; boundary=001a114069a42139de053ff4554e Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org, Linus Torvalds --001a114069a42139de053ff4554e Content-Type: text/plain; charset=UTF-8 On Fri, Oct 28, 2016 at 5:52 PM, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Mon, 24 Oct 2016 01:27:15 +0000 bugzilla-daemon@bugzilla.kernel.org > wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=180101 > > > > Bug ID: 180101 > > Summary: BUG: unable to handle kernel paging request at x with > > "mm: remove gup_flags FOLL_WRITE games from > > __get_user_pages()" > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 4.8.4 > > Hardware: x86-64 > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: high > > Priority: P1 > > Component: Other > > Assignee: akpm@linux-foundation.org > > Reporter: joe.yasi@gmail.com > > Regression: No > > > > After updating to 4.8.3 and 4.8.4, I am having stability issues. I can > also > > reproduce them with 4.7.10. This issue does not occur with 4.8.2. I can > also > > not reproduce after reverting the security fix > > 89eeba1594ac641a30b91942961e80fae978f839 "mm: remove gup_flags > FOLL_WRITE games > > from __get_user_pages()" with 4.8.4. > > That's 19be0eaffa3ac7d8eb ("mm: remove gup_flags FOLL_WRITE games from > __get_user_pages()") in the upstream tree. > > I seem to recall a fix for that patch went flying past earlier this > week. Perhaps Linus recalls? > > 19be0eaffa3ac7d8eb has gone into a billion -stable trees so we'll need > to be attentive... > > I've been able to reproduce the issue with 19be0eaffa3ac7d8eb ("mm: remove gup_flags FOLL_WRITE games from __get_user_pages()") reverted. I initially suspected it because I hadn't seen the issue until 4.8.3, and also saw it when I tried 4.7.10. Initially, I wasn't able to reproduce it with 4.8.2, but I've since been able to do that. This smells like a race condition somewhere. It's possible I just happened to never encounter that race before. The /home partition in question is btrfs on bcache in writethrough mode. The cache drive is an 180 GB Intel SATA SSD, and the backing device is two WD 3 TB SATA HDDs configured in MD RAID 10 f2 layout. / is btrfs on an NVMe SSD. I've also seen btrfs checksum errors in the kernel log when reproducing this. Rebooting and running btrfs scrub finds nothing though so it seems like in memory corruption. Thanks, Joe --001a114069a42139de053ff4554e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

= On Fri, Oct 28, 2016 at 5:52 PM, Andrew Morton <akpm@linux-foundat= ion.org> wrote:

(switched to email.=C2=A0 Please respond via emailed reply-to-all, not via = the
bugzilla web interface).

On Mon, 24 Oct 2016 01:27:15 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug= .cgi?id=3D180101
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Bug ID: 180101
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Summary: BUG: unable to handl= e kernel paging request at x with
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0"mm: remove gup_flags FOLL_WRITE games from
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0__get_user_pages()"
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Product: Memory Management >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Version: 2.5
>=C2=A0 =C2=A0 =C2=A0Kernel Version: 4.8.4
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Hardware: x86-64
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0OS: Linux=
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Tree: Mainline >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Status: NEW
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Severity: high
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Priority: P1
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Component: Other
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Assignee: akpm@linux-foundation.org
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Reporter: joe.yasi@gmail.com
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Regression: No
>
> After updating to 4.8.3 and 4.8.4, I am having stability issues. I can= also
> reproduce them with 4.7.10. This issue does not occur with 4.8.2. I ca= n also
> not reproduce after reverting the security fix
> 89eeba1594ac641a30b91942961e80fae978f839 "mm: remove gup_fla= gs FOLL_WRITE games
> from __get_user_pages()" with 4.8.4.

That's 19be0eaffa3ac7d8eb ("mm: remove gup_flags FOLL_WRITE games = from
__get_user_pages()") in the upstream tree.

I seem to recall a fix for that patch went flying past earlier this
week.=C2=A0 Perhaps Linus recalls?

19be0eaffa3ac7d8eb has gone into a billion -stable trees so we'll need<= br> to be attentive...


I've been able to reproduce the is= sue with 19be0eaffa3ac7d8eb ("mm: remove gup_flags FOLL_WRITE games fr= om __get_user_pages()") reverted. I initially suspected it because I h= adn't seen the issue until 4.8.3, and also saw it when I tried 4.7.10. = Initially, I wasn't able to reproduce it with 4.8.2, but I've since= been able to do that. This smells like a race condition somewhere. It'= s possible I just happened to never encounter that race before.
<= br>
The /home partition in question is btrfs on bcache in writeth= rough mode. The cache drive is an 180 GB Intel SATA SSD, and the backing de= vice is two WD 3 TB SATA HDDs configured in MD RAID 10 f2 layout. / is btrf= s on an NVMe SSD.

I've also seen btrfs checksu= m errors in the kernel log when reproducing this. Rebooting and running btr= fs scrub finds nothing though so it seems like in memory corruption.
<= div>
Thanks,
Joe
--001a114069a42139de053ff4554e-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org