On Fri, Oct 28, 2016 at 5:52 PM, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Mon, 24 Oct 2016 01:27:15 +0000 bugzilla-daemon@bugzilla.kernel.org > wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=180101 > > > > Bug ID: 180101 > > Summary: BUG: unable to handle kernel paging request at x with > > "mm: remove gup_flags FOLL_WRITE games from > > __get_user_pages()" > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 4.8.4 > > Hardware: x86-64 > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: high > > Priority: P1 > > Component: Other > > Assignee: akpm@linux-foundation.org > > Reporter: joe.yasi@gmail.com > > Regression: No > > > > After updating to 4.8.3 and 4.8.4, I am having stability issues. I can > also > > reproduce them with 4.7.10. This issue does not occur with 4.8.2. I can > also > > not reproduce after reverting the security fix > > 89eeba1594ac641a30b91942961e80fae978f839 "mm: remove gup_flags > FOLL_WRITE games > > from __get_user_pages()" with 4.8.4. > > That's 19be0eaffa3ac7d8eb ("mm: remove gup_flags FOLL_WRITE games from > __get_user_pages()") in the upstream tree. > > I seem to recall a fix for that patch went flying past earlier this > week. Perhaps Linus recalls? > > 19be0eaffa3ac7d8eb has gone into a billion -stable trees so we'll need > to be attentive... > > I've been able to reproduce the issue with 19be0eaffa3ac7d8eb ("mm: remove gup_flags FOLL_WRITE games from __get_user_pages()") reverted. I initially suspected it because I hadn't seen the issue until 4.8.3, and also saw it when I tried 4.7.10. Initially, I wasn't able to reproduce it with 4.8.2, but I've since been able to do that. This smells like a race condition somewhere. It's possible I just happened to never encounter that race before. The /home partition in question is btrfs on bcache in writethrough mode. The cache drive is an 180 GB Intel SATA SSD, and the backing device is two WD 3 TB SATA HDDs configured in MD RAID 10 f2 layout. / is btrfs on an NVMe SSD. I've also seen btrfs checksum errors in the kernel log when reproducing this. Rebooting and running btrfs scrub finds nothing though so it seems like in memory corruption. Thanks, Joe