* Re: Hang with v4.15-rc trying to swap back in [not found] <1514398340.3986.10.camel@HansenPartnership.com> @ 2017-12-27 20:50 ` James Bottomley 2017-12-27 23:26 ` Minchan Kim 0 siblings, 1 reply; 11+ messages in thread From: James Bottomley @ 2017-12-27 20:50 UTC (permalink / raw) To: Linux Memory Management List; +Cc: Minchan Kim, Andrew Morton Reverting these three patches fixes the problem: commit aa8d22a11da933dbf880b4933b58931f4aefe91c Author: Minchan Kim <minchan@kernel.org> Date:A A A Wed Nov 15 17:33:11 2017 -0800 A A A A mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 Author: Minchan Kim <minchan@kernel.org> Date:A A A Wed Nov 15 17:33:07 2017 -0800 A A A A mm, swap: skip swapcache for swapin of synchronous device Also need to revert: commit e9a6effa500526e2a19d5ad042cb758b55b1ef93 Author: Huang Ying <huang.ying.caritas@gmail.com> Date:A A A Wed Nov 15 17:33:15 2017 -0800 A A A A mm, swap: fix false error message in __swp_swapcount() (The latter is simply because it used a function that is eliminated by one of the other reversions). A They came into the merge window via the -mm tree as part of a 4 part series: Subject: [PATCH v2 0/4] skip swapcache for super fast device Message-Id: <1505886205-9671-1-git-send-email-minchan@kernel.org > James On Wed, 2017-12-27 at 10:12 -0800, James Bottomley wrote: > I think I've seen this a lot shutting down systems, but never manged > to trace it before. A Now I can reproduce it starting kvm on a 4GB > system with 3GB of memory and booting up a linux OS. A What eventually > happens (after logging into the virtual system) is that kvm itself > hangs, although the stuck process is one of the kworkers in D wait > hung on this stack trace: > > [<0>] io_schedule+0x12/0x40 > [<0>] __lock_page_or_retry+0x2b8/0x300 > [<0>] do_swap_page+0x1b9/0x910 > [<0>] __handle_mm_fault+0x7ee/0xe20 > [<0>] handle_mm_fault+0xce/0x1e0 > [<0>] __get_user_pages+0x104/0x6c0 > [<0>] get_user_pages_remote+0x84/0x1f0 > [<0>] async_pf_execute+0x67/0x1a0 [kvm] > [<0>] process_one_work+0x13c/0x370 > [<0>] worker_thread+0x44/0x3e0 > [<0>] kthread+0xf5/0x130 > [<0>] ret_from_fork+0x1f/0x30 > [<0>] 0xffffffffffffffff > > The async_pf_execute() is a kvm async callback, failure to execute it > appears to be causing the kvm hang. > > Next to go is kswapd > > [<0>] io_schedule+0x12/0x40 > [<0>] __lock_page+0xec/0x120 > [<0>] deferred_split_scan+0x21b/0x2a0 > [<0>] shrink_slab+0x24a/0x460 > [<0>] shrink_node+0x2e6/0x2f0 > [<0>] kswapd+0x2ad/0x730 > [<0>] kthread+0xf5/0x130 > [<0>] ret_from_fork+0x1f/0x30 > [<0>] 0xffffffffffffffff > > And finally systemd-logind hangs making it very difficult to get into > the system (and impossible to shut it down) > > [<0>] call_rwsem_down_write_failed+0x13/0x20 > [<0>] register_shrinker+0x45/0xa0 > [<0>] sget_userns+0x44d/0x480 > [<0>] mount_nodev+0x2a/0xa0 > [<0>] mount_fs+0x34/0x150 > [<0>] vfs_kern_mount+0x62/0x120 > [<0>] do_mount+0x1d7/0xbf0 > [<0>] SyS_mount+0x7e/0xd0 > [<0>] do_syscall_64+0x5b/0x100 > [<0>] entry_SYSCALL64_slow_path+0x25/0x25 > [<0>] 0xffffffffffffffff > > I've seen this with -rc1 and -rc5, so I think it's some problem with > merge window code. > > James -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with v4.15-rc trying to swap back in 2017-12-27 20:50 ` Hang with v4.15-rc trying to swap back in James Bottomley @ 2017-12-27 23:26 ` Minchan Kim 2017-12-27 23:34 ` James Bottomley 0 siblings, 1 reply; 11+ messages in thread From: Minchan Kim @ 2017-12-27 23:26 UTC (permalink / raw) To: James Bottomley; +Cc: Linux Memory Management List, Andrew Morton Hello James, On Wed, Dec 27, 2017 at 12:50:17PM -0800, James Bottomley wrote: > Reverting these three patches fixes the problem: > > commit aa8d22a11da933dbf880b4933b58931f4aefe91c > Author: Minchan Kim <minchan@kernel.org> > Date: Wed Nov 15 17:33:11 2017 -0800 > > mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page > has no other reference > > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 > Author: Minchan Kim <minchan@kernel.org> > Date: Wed Nov 15 17:33:07 2017 -0800 > > mm, swap: skip swapcache for swapin of synchronous device > > Also need to revert: > > commit e9a6effa500526e2a19d5ad042cb758b55b1ef93 > Author: Huang Ying <huang.ying.caritas@gmail.com> > Date: Wed Nov 15 17:33:15 2017 -0800 > > mm, swap: fix false error message in __swp_swapcount() > > (The latter is simply because it used a function that is eliminated by > one of the other reversions). They came into the merge window via the > -mm tree as part of a 4 part series: > > Subject: [PATCH v2 0/4] skip swapcache for super fast device > Message-Id: <1505886205-9671-1-git-send-email-minchan@kernel.org > > > > James Thanks for the report. Patches are related to synchronous swap devices like brd, zram, nvdimm so 1. What swap device do you use among them? 2. Could you tell me how you can reproduce it? Thanks. > > On Wed, 2017-12-27 at 10:12 -0800, James Bottomley wrote: > > I think I've seen this a lot shutting down systems, but never manged > > to trace it before. Now I can reproduce it starting kvm on a 4GB > > system with 3GB of memory and booting up a linux OS. What eventually > > happens (after logging into the virtual system) is that kvm itself > > hangs, although the stuck process is one of the kworkers in D wait > > hung on this stack trace: > > > > [<0>] io_schedule+0x12/0x40 > > [<0>] __lock_page_or_retry+0x2b8/0x300 > > [<0>] do_swap_page+0x1b9/0x910 > > [<0>] __handle_mm_fault+0x7ee/0xe20 > > [<0>] handle_mm_fault+0xce/0x1e0 > > [<0>] __get_user_pages+0x104/0x6c0 > > [<0>] get_user_pages_remote+0x84/0x1f0 > > [<0>] async_pf_execute+0x67/0x1a0 [kvm] > > [<0>] process_one_work+0x13c/0x370 > > [<0>] worker_thread+0x44/0x3e0 > > [<0>] kthread+0xf5/0x130 > > [<0>] ret_from_fork+0x1f/0x30 > > [<0>] 0xffffffffffffffff > > > > The async_pf_execute() is a kvm async callback, failure to execute it > > appears to be causing the kvm hang. > > > > Next to go is kswapd > > > > [<0>] io_schedule+0x12/0x40 > > [<0>] __lock_page+0xec/0x120 > > [<0>] deferred_split_scan+0x21b/0x2a0 > > [<0>] shrink_slab+0x24a/0x460 > > [<0>] shrink_node+0x2e6/0x2f0 > > [<0>] kswapd+0x2ad/0x730 > > [<0>] kthread+0xf5/0x130 > > [<0>] ret_from_fork+0x1f/0x30 > > [<0>] 0xffffffffffffffff > > > > And finally systemd-logind hangs making it very difficult to get into > > the system (and impossible to shut it down) > > > > [<0>] call_rwsem_down_write_failed+0x13/0x20 > > [<0>] register_shrinker+0x45/0xa0 > > [<0>] sget_userns+0x44d/0x480 > > [<0>] mount_nodev+0x2a/0xa0 > > [<0>] mount_fs+0x34/0x150 > > [<0>] vfs_kern_mount+0x62/0x120 > > [<0>] do_mount+0x1d7/0xbf0 > > [<0>] SyS_mount+0x7e/0xd0 > > [<0>] do_syscall_64+0x5b/0x100 > > [<0>] entry_SYSCALL64_slow_path+0x25/0x25 > > [<0>] 0xffffffffffffffff > > > > I've seen this with -rc1 and -rc5, so I think it's some problem with > > merge window code. > > > > James > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with v4.15-rc trying to swap back in 2017-12-27 23:26 ` Minchan Kim @ 2017-12-27 23:34 ` James Bottomley 2017-12-27 23:56 ` Minchan Kim 0 siblings, 1 reply; 11+ messages in thread From: James Bottomley @ 2017-12-27 23:34 UTC (permalink / raw) To: Minchan Kim; +Cc: Linux Memory Management List, Andrew Morton On Thu, 2017-12-28 at 08:26 +0900, Minchan Kim wrote: > Hello James, > > On Wed, Dec 27, 2017 at 12:50:17PM -0800, James Bottomley wrote: > > > > Reverting these three patches fixes the problem: > > > > commit aa8d22a11da933dbf880b4933b58931f4aefe91c > > Author: Minchan Kim <minchan@kernel.org> > > Date:A A A Wed Nov 15 17:33:11 2017 -0800 > > > > A A A A mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped > > page > > has no other reference > > > > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 > > Author: Minchan Kim <minchan@kernel.org> > > Date:A A A Wed Nov 15 17:33:07 2017 -0800 > > > > A A A A mm, swap: skip swapcache for swapin of synchronous device > > > > Also need to revert: > > > > commit e9a6effa500526e2a19d5ad042cb758b55b1ef93 > > Author: Huang Ying <huang.ying.caritas@gmail.com> > > Date:A A A Wed Nov 15 17:33:15 2017 -0800 > > > > A A A A mm, swap: fix false error message in __swp_swapcount() > > > > (The latter is simply because it used a function that is eliminated > > by > > one of the other reversions). A They came into the merge window via > > the > > -mm tree as part of a 4 part series: > > > > Subject: [PATCH v2 0/4] skip swapcache for super fast device > > Message-Id: <1505886205-9671-1-git-send-email-minchan@kernel > > .org > > > > > > > > > > James > > Thanks for the report. > Patches are related to synchronous swap devices like brd, zram, > nvdimm so > > 1. What swap device do you use among them? I've reproduced on nvme and sata spinning rust. > 2. Could you tell me how you can reproduce it? The way to reproduce is to force something to swap and then get it to try to touch the page again. A I do this on my systems by using a large virtual machine, as I said in the email. A There isn't really any definitive reproduction method beyond that. James -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with v4.15-rc trying to swap back in 2017-12-27 23:34 ` James Bottomley @ 2017-12-27 23:56 ` Minchan Kim 2017-12-28 17:41 ` James Bottomley 0 siblings, 1 reply; 11+ messages in thread From: Minchan Kim @ 2017-12-27 23:56 UTC (permalink / raw) To: James Bottomley; +Cc: Linux Memory Management List, Andrew Morton On Wed, Dec 27, 2017 at 03:34:49PM -0800, James Bottomley wrote: > On Thu, 2017-12-28 at 08:26 +0900, Minchan Kim wrote: > > Hello James, > > > > On Wed, Dec 27, 2017 at 12:50:17PM -0800, James Bottomley wrote: > > > > > > Reverting these three patches fixes the problem: > > > > > > commit aa8d22a11da933dbf880b4933b58931f4aefe91c > > > Author: Minchan Kim <minchan@kernel.org> > > > Date: Wed Nov 15 17:33:11 2017 -0800 > > > > > > mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped > > > page > > > has no other reference > > > > > > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 > > > Author: Minchan Kim <minchan@kernel.org> > > > Date: Wed Nov 15 17:33:07 2017 -0800 > > > > > > mm, swap: skip swapcache for swapin of synchronous device > > > > > > Also need to revert: > > > > > > commit e9a6effa500526e2a19d5ad042cb758b55b1ef93 > > > Author: Huang Ying <huang.ying.caritas@gmail.com> > > > Date: Wed Nov 15 17:33:15 2017 -0800 > > > > > > mm, swap: fix false error message in __swp_swapcount() > > > > > > (The latter is simply because it used a function that is eliminated > > > by > > > one of the other reversions). They came into the merge window via > > > the > > > -mm tree as part of a 4 part series: > > > > > > Subject: [PATCH v2 0/4] skip swapcache for super fast device > > > Message-Id: <1505886205-9671-1-git-send-email-minchan@kernel > > > .org > > > > > > > > > > > > > > James > > > > Thanks for the report. > > Patches are related to synchronous swap devices like brd, zram, > > nvdimm so > > > > 1. What swap device do you use among them? > > I've reproduced on nvme and sata spinning rust. > > > 2. Could you tell me how you can reproduce it? > > The way to reproduce is to force something to swap and then get it to > try to touch the page again. I do this on my systems by using a large > virtual machine, as I said in the email. There isn't really any > definitive reproduction method beyond that. > Thanks for the information. It seems I made a bug on do_swap_page. I want to confirm before sending formal patch. Could you try on it? diff --git a/mm/memory.c b/mm/memory.c index ca5674cbaff2..240521f1322d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2889,9 +2889,12 @@ int do_swap_page(struct vm_fault *vmf) delayacct_set_flag(DELAYACCT_PF_SWAPIN); - if (!page) + if (!page) { page = lookup_swap_cache(entry, vma_readahead ? vma : NULL, vmf->address); + swapcache = page; + } + if (!page) { struct swap_info_struct *si = swp_swap_info(entry); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with v4.15-rc trying to swap back in 2017-12-27 23:56 ` Minchan Kim @ 2017-12-28 17:41 ` James Bottomley 2017-12-28 19:00 ` James Bottomley 0 siblings, 1 reply; 11+ messages in thread From: James Bottomley @ 2017-12-28 17:41 UTC (permalink / raw) To: Minchan Kim Cc: Linux Memory Management List, Andrew Morton, Thorsten Leemhuis On Thu, 2017-12-28 at 08:56 +0900, Minchan Kim wrote: > On Wed, Dec 27, 2017 at 03:34:49PM -0800, James Bottomley wrote: > > > > On Thu, 2017-12-28 at 08:26 +0900, Minchan Kim wrote: > > > > > > Hello James, > > > > > > On Wed, Dec 27, 2017 at 12:50:17PM -0800, James Bottomley wrote: > > > > > > > > > > > > Reverting these three patches fixes the problem: > > > > > > > > commit aa8d22a11da933dbf880b4933b58931f4aefe91c > > > > Author: Minchan Kim <minchan@kernel.org> > > > > Date:A A A Wed Nov 15 17:33:11 2017 -0800 > > > > > > > > A A A A mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if > > > > swapped page has no other reference > > > > > > > > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 > > > > Author: Minchan Kim <minchan@kernel.org> > > > > Date:A A A Wed Nov 15 17:33:07 2017 -0800 > > > > > > > > A A A A mm, swap: skip swapcache for swapin of synchronous device > > > > > > > > Also need to revert: > > > > > > > > commit e9a6effa500526e2a19d5ad042cb758b55b1ef93 > > > > Author: Huang Ying <huang.ying.caritas@gmail.com> > > > > Date:A A A Wed Nov 15 17:33:15 2017 -0800 > > > > > > > > A A A A mm, swap: fix false error message in __swp_swapcount() > > > > > > > > (The latter is simply because it used a function that is > > > > eliminated by one of the other reversions). A They came into the > > > > merge window via the -mm tree as part of a 4 part series: > > > > > > > > Subject: [PATCH v2 0/4] skip swapcache for super fast > > > > device > > > > Message-Id: <1505886205-9671-1-git-send-email- > > > > minchan@kernel.org > > > > > > > > > > > > > > > > > > > > > > > James > > > > > > Thanks for the report. > > > Patches are related to synchronous swap devices like brd, zram, > > > nvdimm so > > > > > > 1. What swap device do you use among them? > > > > I've reproduced on nvme and sata spinning rust. > > > > > > > > 2. Could you tell me how you can reproduce it? > > > > The way to reproduce is to force something to swap and then get it > > to try to touch the page again. A I do this on my systems by using a > > large virtual machine, as I said in the email. A There isn't really > > any definitive reproduction method beyond that. > > > > Thanks for the information. It seems I made a bug on do_swap_page. I > want to confirm before sending formal patch. Could you try on it? > > diff --git a/mm/memory.c b/mm/memory.c > index ca5674cbaff2..240521f1322d 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2889,9 +2889,12 @@ int do_swap_page(struct vm_fault *vmf) > A > A > A delayacct_set_flag(DELAYACCT_PF_SWAPIN); > - if (!page) > + if (!page) { > A page = lookup_swap_cache(entry, vma_readahead ? vma > : NULL, > A A vmf->address); > + swapcache = page; > + } > + This hangs in precisely the same way first kworker then kswapd with the same stack trace. I'd guess that since they're both in io_schedule, the problem is that the io_scheduler is taking far too long servicing the requests due to some priority issue you've introduced. Since we're at -rc5, soon to be -rc6, let's just revert the whole series and you can retry it for 4.16. A The whole point seems to be for zram, which isn't really a huge use case. James -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with v4.15-rc trying to swap back in 2017-12-28 17:41 ` James Bottomley @ 2017-12-28 19:00 ` James Bottomley 2017-12-29 0:00 ` Minchan Kim 2018-01-17 22:33 ` Hugh Dickins 0 siblings, 2 replies; 11+ messages in thread From: James Bottomley @ 2017-12-28 19:00 UTC (permalink / raw) To: Minchan Kim Cc: Linux Memory Management List, Andrew Morton, Thorsten Leemhuis On Thu, 2017-12-28 at 09:41 -0800, James Bottomley wrote: > I'd guess that since they're both in io_schedule, the problem is that > the io_scheduler is taking far too long servicing the requests due to > some priority issue you've introduced. OK, so after some analysis, that turned out to be incorrect. A The problem seems to be that we're exiting do_swap_page() with locked pages that have been read in from swap. Your changelogs are entirely unclear on why you changed the swapcache setting logic in this patch: commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 Author: Minchan Kim <minchan@kernel.org> Date:A A A Wed Nov 15 17:33:07 2017 -0800 A A A A mm, swap: skip swapcache for swapin of synchronous device But I think you're using swapcache == NULL as a signal the page came from a synchronous device. A In which case the bug is that you've forgotten we may already have picked up a page in swap_readahead_detect() which you're wrongly keeping swapcache == NULL for and the fix is this (it works on my system, although I'm still getting an unaccountable shutdown delay). I still think we should revert this series, because this may not be the only bug lurking in the code, so it should go through a lot more rigorous testing than it has. James --- diff --git a/mm/memory.c b/mm/memory.c index ca5674cbaff2..31f9845c340e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2847,7 +2847,7 @@ EXPORT_SYMBOL(unmap_mapping_range); int do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; - struct page *page = NULL, *swapcache = NULL; + struct page *page = NULL, *swapcache; struct mem_cgroup *memcg; struct vma_swap_readahead swap_ra; swp_entry_t entry; @@ -2892,6 +2892,7 @@ int do_swap_page(struct vm_fault *vmf) if (!page) page = lookup_swap_cache(entry, vma_readahead ? vma : NULL, vmf->address); + swapcache = page; if (!page) { struct swap_info_struct *si = swp_swap_info(entry); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with v4.15-rc trying to swap back in 2017-12-28 19:00 ` James Bottomley @ 2017-12-29 0:00 ` Minchan Kim 2017-12-29 0:45 ` Minchan Kim 2018-01-17 22:33 ` Hugh Dickins 1 sibling, 1 reply; 11+ messages in thread From: Minchan Kim @ 2017-12-29 0:00 UTC (permalink / raw) To: James Bottomley Cc: Linux Memory Management List, Andrew Morton, Thorsten Leemhuis On Thu, Dec 28, 2017 at 11:00:40AM -0800, James Bottomley wrote: > On Thu, 2017-12-28 at 09:41 -0800, James Bottomley wrote: > > I'd guess that since they're both in io_schedule, the problem is that > > the io_scheduler is taking far too long servicing the requests due to > > some priority issue you've introduced. > > OK, so after some analysis, that turned out to be incorrect. The > problem seems to be that we're exiting do_swap_page() with locked pages > that have been read in from swap. > > Your changelogs are entirely unclear on why you changed the swapcache > setting logic in this patch: > > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 > Author: Minchan Kim <minchan@kernel.org> > Date: Wed Nov 15 17:33:07 2017 -0800 > > mm, swap: skip swapcache for swapin of synchronous device > > But I think you're using swapcache == NULL as a signal the page came > from a synchronous device. In which case the bug is that you've Exactly. Because the patchset aims for skipping swap cache for synchronous device and some logics of do_swap_page has has assumed the page is on swap cache. > forgotten we may already have picked up a page in > swap_readahead_detect() which you're wrongly keeping swapcache == NULL > for and the fix is this (it works on my system, although I'm still > getting an unaccountable shutdown delay). SIGH. I missed that. > > I still think we should revert this series, because this may not be the > only bug lurking in the code, so it should go through a lot more > rigorous testing than it has. I have no problem. It's not urgent. Andrew, this is reverting patch based on 4.15-rc5. And I need to send another revert patch against on mmotm because it would have conflict due to vma-based readahead restructuring patch. I will send soon. Thanks. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with v4.15-rc trying to swap back in 2017-12-29 0:00 ` Minchan Kim @ 2017-12-29 0:45 ` Minchan Kim 0 siblings, 0 replies; 11+ messages in thread From: Minchan Kim @ 2017-12-29 0:45 UTC (permalink / raw) To: James Bottomley Cc: Linux Memory Management List, Andrew Morton, Thorsten Leemhuis On Fri, Dec 29, 2017 at 09:00:16AM +0900, Minchan Kim wrote: > On Thu, Dec 28, 2017 at 11:00:40AM -0800, James Bottomley wrote: > > On Thu, 2017-12-28 at 09:41 -0800, James Bottomley wrote: > > > I'd guess that since they're both in io_schedule, the problem is that > > > the io_scheduler is taking far too long servicing the requests due to > > > some priority issue you've introduced. > > > > OK, so after some analysis, that turned out to be incorrect. The > > problem seems to be that we're exiting do_swap_page() with locked pages > > that have been read in from swap. > > > > Your changelogs are entirely unclear on why you changed the swapcache > > setting logic in this patch: > > > > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 > > Author: Minchan Kim <minchan@kernel.org> > > Date: Wed Nov 15 17:33:07 2017 -0800 > > > > mm, swap: skip swapcache for swapin of synchronous device > > > > But I think you're using swapcache == NULL as a signal the page came > > from a synchronous device. In which case the bug is that you've > > Exactly. Because the patchset aims for skipping swap cache for synchronous > device and some logics of do_swap_page has has assumed the page is on > swap cache. > > > forgotten we may already have picked up a page in > > swap_readahead_detect() which you're wrongly keeping swapcache == NULL > > for and the fix is this (it works on my system, although I'm still > > getting an unaccountable shutdown delay). > > SIGH. I missed that. > > > > > I still think we should revert this series, because this may not be the > > only bug lurking in the code, so it should go through a lot more > > rigorous testing than it has. > > I have no problem. It's not urgent. > > Andrew, this is reverting patch based on 4.15-rc5. And I need to send > another revert patch against on mmotm because it would have conflict due to > vma-based readahead restructuring patch. I will send soon. > We should revert (23c47d2ada9f, bdi: introduce BDI_CAP_SYNCHRONOUS_IO) as well. I resend reverting patch with including 23c47d2ada9f. Andrew, Pleas take this patch for linus-tree. Sorry for the confusion. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with v4.15-rc trying to swap back in 2017-12-28 19:00 ` James Bottomley 2017-12-29 0:00 ` Minchan Kim @ 2018-01-17 22:33 ` Hugh Dickins 2018-01-17 22:58 ` Andrew Morton 1 sibling, 1 reply; 11+ messages in thread From: Hugh Dickins @ 2018-01-17 22:33 UTC (permalink / raw) To: Andrew Morton Cc: Minchan Kim, James Bottomley, Linux Memory Management List, Thorsten Leemhuis [-- Attachment #1: Type: TEXT/PLAIN, Size: 2413 bytes --] On Thu, 28 Dec 2017, James Bottomley wrote: > On Thu, 2017-12-28 at 09:41 -0800, James Bottomley wrote: > > I'd guess that since they're both in io_schedule, the problem is that > > the io_scheduler is taking far too long servicing the requests due to > > some priority issue you've introduced. > > OK, so after some analysis, that turned out to be incorrect. The > problem seems to be that we're exiting do_swap_page() with locked pages > that have been read in from swap. > > Your changelogs are entirely unclear on why you changed the swapcache > setting logic in this patch: > > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 > Author: Minchan Kim <minchan@kernel.org> > Date: Wed Nov 15 17:33:07 2017 -0800 > > mm, swap: skip swapcache for swapin of synchronous device > > But I think you're using swapcache == NULL as a signal the page came > from a synchronous device. In which case the bug is that you've > forgotten we may already have picked up a page in > swap_readahead_detect() which you're wrongly keeping swapcache == NULL > for and the fix is this (it works on my system, although I'm still > getting an unaccountable shutdown delay). > > I still think we should revert this series, because this may not be the > only bug lurking in the code, so it should go through a lot more > rigorous testing than it has. Andrew, neither the fix below (works for me, though I have seen other swap funniness, most probably unrelated), nor the reversion preferred by James and Minchan (later in this linux-mm thread), was in 4.15-rc8: the sands of time are running out... Hugh > > James > > --- > > diff --git a/mm/memory.c b/mm/memory.c > index ca5674cbaff2..31f9845c340e 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2847,7 +2847,7 @@ EXPORT_SYMBOL(unmap_mapping_range); > int do_swap_page(struct vm_fault *vmf) > { > struct vm_area_struct *vma = vmf->vma; > - struct page *page = NULL, *swapcache = NULL; > + struct page *page = NULL, *swapcache; > struct mem_cgroup *memcg; > struct vma_swap_readahead swap_ra; > swp_entry_t entry; > @@ -2892,6 +2892,7 @@ int do_swap_page(struct vm_fault *vmf) > if (!page) > page = lookup_swap_cache(entry, vma_readahead ? vma : NULL, > vmf->address); > + swapcache = page; > if (!page) { > struct swap_info_struct *si = swp_swap_info(entry); > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with v4.15-rc trying to swap back in 2018-01-17 22:33 ` Hugh Dickins @ 2018-01-17 22:58 ` Andrew Morton 2018-01-17 23:15 ` Hugh Dickins 0 siblings, 1 reply; 11+ messages in thread From: Andrew Morton @ 2018-01-17 22:58 UTC (permalink / raw) To: Hugh Dickins Cc: Minchan Kim, James Bottomley, Linux Memory Management List, Thorsten Leemhuis On Wed, 17 Jan 2018 14:33:21 -0800 (PST) Hugh Dickins <hughd@google.com> wrote: > On Thu, 28 Dec 2017, James Bottomley wrote: > > On Thu, 2017-12-28 at 09:41 -0800, James Bottomley wrote: > > > I'd guess that since they're both in io_schedule, the problem is that > > > the io_scheduler is taking far too long servicing the requests due to > > > some priority issue you've introduced. > > > > OK, so after some analysis, that turned out to be incorrect. The > > problem seems to be that we're exiting do_swap_page() with locked pages > > that have been read in from swap. > > > > Your changelogs are entirely unclear on why you changed the swapcache > > setting logic in this patch: > > > > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 > > Author: Minchan Kim <minchan@kernel.org> > > Date: Wed Nov 15 17:33:07 2017 -0800 > > > > mm, swap: skip swapcache for swapin of synchronous device > > > > But I think you're using swapcache == NULL as a signal the page came > > from a synchronous device. In which case the bug is that you've > > forgotten we may already have picked up a page in > > swap_readahead_detect() which you're wrongly keeping swapcache == NULL > > for and the fix is this (it works on my system, although I'm still > > getting an unaccountable shutdown delay). > > > > I still think we should revert this series, because this may not be the > > only bug lurking in the code, so it should go through a lot more > > rigorous testing than it has. > > Andrew, neither the fix below (works for me, though I have seen other > swap funniness, most probably unrelated), nor the reversion preferred > by James and Minchan (later in this linux-mm thread), was in 4.15-rc8: > the sands of time are running out... Yup. I'm actually planning on sending in this one. OK by you? From: Minchan Kim <minchan@kernel.org> Subject: mm/memory.c: release locked page in do_swap_page() James reported a bug in swap paging-in from his testing. It is that do_swap_page doesn't release locked page so system hang-up happens due to a deadlock on PG_locked. It was introduced by 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous device") because I missed swap cache hit places to update swapcache variable to work well with other logics against swapcache in do_swap_page. This patch fixes it. Debugged by James Bottomley. Link: http://lkml.kernel.org/r/<1514407817.4169.4.camel@HansenPartnership.com> Link: http://lkml.kernel.org/r/20180102235606.GA19438@bbox Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Hugh Dickins <hughd@google.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/memory.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff -puN mm/memory.c~mm-release-locked-page-in-do_swap_page mm/memory.c --- a/mm/memory.c~mm-release-locked-page-in-do_swap_page +++ a/mm/memory.c @@ -2857,8 +2857,11 @@ int do_swap_page(struct vm_fault *vmf) int ret = 0; bool vma_readahead = swap_use_vma_readahead(); - if (vma_readahead) + if (vma_readahead) { page = swap_readahead_detect(vmf, &swap_ra); + swapcache = page; + } + if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) { if (page) put_page(page); @@ -2889,9 +2892,12 @@ int do_swap_page(struct vm_fault *vmf) delayacct_set_flag(DELAYACCT_PF_SWAPIN); - if (!page) + if (!page) { page = lookup_swap_cache(entry, vma_readahead ? vma : NULL, vmf->address); + swapcache = page; + } + if (!page) { struct swap_info_struct *si = swp_swap_info(entry); _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Hang with v4.15-rc trying to swap back in 2018-01-17 22:58 ` Andrew Morton @ 2018-01-17 23:15 ` Hugh Dickins 0 siblings, 0 replies; 11+ messages in thread From: Hugh Dickins @ 2018-01-17 23:15 UTC (permalink / raw) To: Andrew Morton Cc: Hugh Dickins, Minchan Kim, James Bottomley, Linux Memory Management List, Thorsten Leemhuis [-- Attachment #1: Type: TEXT/PLAIN, Size: 4163 bytes --] On Wed, 17 Jan 2018, Andrew Morton wrote: > On Wed, 17 Jan 2018 14:33:21 -0800 (PST) Hugh Dickins <hughd@google.com> wrote: > > On Thu, 28 Dec 2017, James Bottomley wrote: > > > On Thu, 2017-12-28 at 09:41 -0800, James Bottomley wrote: > > > > I'd guess that since they're both in io_schedule, the problem is that > > > > the io_scheduler is taking far too long servicing the requests due to > > > > some priority issue you've introduced. > > > > > > OK, so after some analysis, that turned out to be incorrect. The > > > problem seems to be that we're exiting do_swap_page() with locked pages > > > that have been read in from swap. > > > > > > Your changelogs are entirely unclear on why you changed the swapcache > > > setting logic in this patch: > > > > > > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168 > > > Author: Minchan Kim <minchan@kernel.org> > > > Date: Wed Nov 15 17:33:07 2017 -0800 > > > > > > mm, swap: skip swapcache for swapin of synchronous device > > > > > > But I think you're using swapcache == NULL as a signal the page came > > > from a synchronous device. In which case the bug is that you've > > > forgotten we may already have picked up a page in > > > swap_readahead_detect() which you're wrongly keeping swapcache == NULL > > > for and the fix is this (it works on my system, although I'm still > > > getting an unaccountable shutdown delay). > > > > > > I still think we should revert this series, because this may not be the > > > only bug lurking in the code, so it should go through a lot more > > > rigorous testing than it has. > > > > Andrew, neither the fix below (works for me, though I have seen other > > swap funniness, most probably unrelated), nor the reversion preferred > > by James and Minchan (later in this linux-mm thread), was in 4.15-rc8: > > the sands of time are running out... > > Yup. I'm actually planning on sending in this one. OK by you? Thanks, yes, that looks equivalent to what I've been running with. > > > From: Minchan Kim <minchan@kernel.org> > Subject: mm/memory.c: release locked page in do_swap_page() > > James reported a bug in swap paging-in from his testing. It is that > do_swap_page doesn't release locked page so system hang-up happens due to > a deadlock on PG_locked. > > It was introduced by 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of > synchronous device") because I missed swap cache hit places to update > swapcache variable to work well with other logics against swapcache in > do_swap_page. > > This patch fixes it. > > Debugged by James Bottomley. > > Link: http://lkml.kernel.org/r/<1514407817.4169.4.camel@HansenPartnership.com> > Link: http://lkml.kernel.org/r/20180102235606.GA19438@bbox > Signed-off-by: Minchan Kim <minchan@kernel.org> > Reported-by: James Bottomley <James.Bottomley@hansenpartnership.com> > Cc: Hugh Dickins <hughd@google.com> Acked-by: Hugh Dickins <hughd@google.com> > Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > Cc: Huang Ying <ying.huang@intel.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/memory.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff -puN mm/memory.c~mm-release-locked-page-in-do_swap_page mm/memory.c > --- a/mm/memory.c~mm-release-locked-page-in-do_swap_page > +++ a/mm/memory.c > @@ -2857,8 +2857,11 @@ int do_swap_page(struct vm_fault *vmf) > int ret = 0; > bool vma_readahead = swap_use_vma_readahead(); > > - if (vma_readahead) > + if (vma_readahead) { > page = swap_readahead_detect(vmf, &swap_ra); > + swapcache = page; > + } > + > if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) { > if (page) > put_page(page); > @@ -2889,9 +2892,12 @@ int do_swap_page(struct vm_fault *vmf) > > > delayacct_set_flag(DELAYACCT_PF_SWAPIN); > - if (!page) > + if (!page) { > page = lookup_swap_cache(entry, vma_readahead ? vma : NULL, > vmf->address); > + swapcache = page; > + } > + > if (!page) { > struct swap_info_struct *si = swp_swap_info(entry); > > _ ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-01-17 23:15 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1514398340.3986.10.camel@HansenPartnership.com>
2017-12-27 20:50 ` Hang with v4.15-rc trying to swap back in James Bottomley
2017-12-27 23:26 ` Minchan Kim
2017-12-27 23:34 ` James Bottomley
2017-12-27 23:56 ` Minchan Kim
2017-12-28 17:41 ` James Bottomley
2017-12-28 19:00 ` James Bottomley
2017-12-29 0:00 ` Minchan Kim
2017-12-29 0:45 ` Minchan Kim
2018-01-17 22:33 ` Hugh Dickins
2018-01-17 22:58 ` Andrew Morton
2018-01-17 23:15 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox