linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: Hang with v4.15-rc trying to swap back in
       [not found] <1514398340.3986.10.camel@HansenPartnership.com>
@ 2017-12-27 20:50 ` James Bottomley
  2017-12-27 23:26   ` Minchan Kim
  0 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2017-12-27 20:50 UTC (permalink / raw)
  To: Linux Memory Management List; +Cc: Minchan Kim, Andrew Morton

Reverting these three patches fixes the problem:

commit aa8d22a11da933dbf880b4933b58931f4aefe91c
Author: Minchan Kim <minchan@kernel.org>
Date:A A A Wed Nov 15 17:33:11 2017 -0800

A A A A mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page
has no other reference

commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168
Author: Minchan Kim <minchan@kernel.org>
Date:A A A Wed Nov 15 17:33:07 2017 -0800

A A A A mm, swap: skip swapcache for swapin of synchronous device

Also need to revert:

commit e9a6effa500526e2a19d5ad042cb758b55b1ef93
Author: Huang Ying <huang.ying.caritas@gmail.com>
Date:A A A Wed Nov 15 17:33:15 2017 -0800

A A A A mm, swap: fix false error message in __swp_swapcount()

(The latter is simply because it used a function that is eliminated by
one of the other reversions). A They came into the merge window via the
-mm tree as part of a 4 part series:

Subject:	[PATCH v2 0/4] skip swapcache for super fast device
Message-Id:	<1505886205-9671-1-git-send-email-minchan@kernel.org
>

James

On Wed, 2017-12-27 at 10:12 -0800, James Bottomley wrote:
> I think I've seen this a lot shutting down systems, but never manged
> to trace it before. A Now I can reproduce it starting kvm on a 4GB
> system with 3GB of memory and booting up a linux OS. A What eventually
> happens (after logging into the virtual system) is that kvm itself
> hangs, although the stuck process is one of the kworkers in D wait
> hung on this stack trace:
> 
> [<0>] io_schedule+0x12/0x40
> [<0>] __lock_page_or_retry+0x2b8/0x300
> [<0>] do_swap_page+0x1b9/0x910
> [<0>] __handle_mm_fault+0x7ee/0xe20
> [<0>] handle_mm_fault+0xce/0x1e0
> [<0>] __get_user_pages+0x104/0x6c0
> [<0>] get_user_pages_remote+0x84/0x1f0
> [<0>] async_pf_execute+0x67/0x1a0 [kvm]
> [<0>] process_one_work+0x13c/0x370
> [<0>] worker_thread+0x44/0x3e0
> [<0>] kthread+0xf5/0x130
> [<0>] ret_from_fork+0x1f/0x30
> [<0>] 0xffffffffffffffff
> 
> The async_pf_execute() is a kvm async callback, failure to execute it
> appears to be causing the kvm hang.
> 
> Next to go is kswapd
> 
> [<0>] io_schedule+0x12/0x40
> [<0>] __lock_page+0xec/0x120
> [<0>] deferred_split_scan+0x21b/0x2a0
> [<0>] shrink_slab+0x24a/0x460
> [<0>] shrink_node+0x2e6/0x2f0
> [<0>] kswapd+0x2ad/0x730
> [<0>] kthread+0xf5/0x130
> [<0>] ret_from_fork+0x1f/0x30
> [<0>] 0xffffffffffffffff
> 
> And finally systemd-logind hangs making it very difficult to get into
> the system (and impossible to shut it down)
> 
> [<0>] call_rwsem_down_write_failed+0x13/0x20
> [<0>] register_shrinker+0x45/0xa0
> [<0>] sget_userns+0x44d/0x480
> [<0>] mount_nodev+0x2a/0xa0
> [<0>] mount_fs+0x34/0x150
> [<0>] vfs_kern_mount+0x62/0x120
> [<0>] do_mount+0x1d7/0xbf0
> [<0>] SyS_mount+0x7e/0xd0
> [<0>] do_syscall_64+0x5b/0x100
> [<0>] entry_SYSCALL64_slow_path+0x25/0x25
> [<0>] 0xffffffffffffffff
> 
> I've seen this with -rc1 and -rc5, so I think it's some problem with
> merge window code.
> 
> James

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Hang with v4.15-rc trying to swap back in
  2017-12-27 20:50 ` Hang with v4.15-rc trying to swap back in James Bottomley
@ 2017-12-27 23:26   ` Minchan Kim
  2017-12-27 23:34     ` James Bottomley
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2017-12-27 23:26 UTC (permalink / raw)
  To: James Bottomley; +Cc: Linux Memory Management List, Andrew Morton

Hello James,

On Wed, Dec 27, 2017 at 12:50:17PM -0800, James Bottomley wrote:
> Reverting these three patches fixes the problem:
> 
> commit aa8d22a11da933dbf880b4933b58931f4aefe91c
> Author: Minchan Kim <minchan@kernel.org>
> Date:   Wed Nov 15 17:33:11 2017 -0800
> 
>     mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page
> has no other reference
> 
> commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168
> Author: Minchan Kim <minchan@kernel.org>
> Date:   Wed Nov 15 17:33:07 2017 -0800
> 
>     mm, swap: skip swapcache for swapin of synchronous device
> 
> Also need to revert:
> 
> commit e9a6effa500526e2a19d5ad042cb758b55b1ef93
> Author: Huang Ying <huang.ying.caritas@gmail.com>
> Date:   Wed Nov 15 17:33:15 2017 -0800
> 
>     mm, swap: fix false error message in __swp_swapcount()
> 
> (The latter is simply because it used a function that is eliminated by
> one of the other reversions).  They came into the merge window via the
> -mm tree as part of a 4 part series:
> 
> Subject:	[PATCH v2 0/4] skip swapcache for super fast device
> Message-Id:	<1505886205-9671-1-git-send-email-minchan@kernel.org
> >
> 
> James

Thanks for the report.
Patches are related to synchronous swap devices like brd, zram, nvdimm so

1. What swap device do you use among them?

2. Could you tell me how you can reproduce it?

Thanks.

> 
> On Wed, 2017-12-27 at 10:12 -0800, James Bottomley wrote:
> > I think I've seen this a lot shutting down systems, but never manged
> > to trace it before.  Now I can reproduce it starting kvm on a 4GB
> > system with 3GB of memory and booting up a linux OS.  What eventually
> > happens (after logging into the virtual system) is that kvm itself
> > hangs, although the stuck process is one of the kworkers in D wait
> > hung on this stack trace:
> > 
> > [<0>] io_schedule+0x12/0x40
> > [<0>] __lock_page_or_retry+0x2b8/0x300
> > [<0>] do_swap_page+0x1b9/0x910
> > [<0>] __handle_mm_fault+0x7ee/0xe20
> > [<0>] handle_mm_fault+0xce/0x1e0
> > [<0>] __get_user_pages+0x104/0x6c0
> > [<0>] get_user_pages_remote+0x84/0x1f0
> > [<0>] async_pf_execute+0x67/0x1a0 [kvm]
> > [<0>] process_one_work+0x13c/0x370
> > [<0>] worker_thread+0x44/0x3e0
> > [<0>] kthread+0xf5/0x130
> > [<0>] ret_from_fork+0x1f/0x30
> > [<0>] 0xffffffffffffffff
> > 
> > The async_pf_execute() is a kvm async callback, failure to execute it
> > appears to be causing the kvm hang.
> > 
> > Next to go is kswapd
> > 
> > [<0>] io_schedule+0x12/0x40
> > [<0>] __lock_page+0xec/0x120
> > [<0>] deferred_split_scan+0x21b/0x2a0
> > [<0>] shrink_slab+0x24a/0x460
> > [<0>] shrink_node+0x2e6/0x2f0
> > [<0>] kswapd+0x2ad/0x730
> > [<0>] kthread+0xf5/0x130
> > [<0>] ret_from_fork+0x1f/0x30
> > [<0>] 0xffffffffffffffff
> > 
> > And finally systemd-logind hangs making it very difficult to get into
> > the system (and impossible to shut it down)
> > 
> > [<0>] call_rwsem_down_write_failed+0x13/0x20
> > [<0>] register_shrinker+0x45/0xa0
> > [<0>] sget_userns+0x44d/0x480
> > [<0>] mount_nodev+0x2a/0xa0
> > [<0>] mount_fs+0x34/0x150
> > [<0>] vfs_kern_mount+0x62/0x120
> > [<0>] do_mount+0x1d7/0xbf0
> > [<0>] SyS_mount+0x7e/0xd0
> > [<0>] do_syscall_64+0x5b/0x100
> > [<0>] entry_SYSCALL64_slow_path+0x25/0x25
> > [<0>] 0xffffffffffffffff
> > 
> > I've seen this with -rc1 and -rc5, so I think it's some problem with
> > merge window code.
> > 
> > James
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Hang with v4.15-rc trying to swap back in
  2017-12-27 23:26   ` Minchan Kim
@ 2017-12-27 23:34     ` James Bottomley
  2017-12-27 23:56       ` Minchan Kim
  0 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2017-12-27 23:34 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Linux Memory Management List, Andrew Morton

On Thu, 2017-12-28 at 08:26 +0900, Minchan Kim wrote:
> Hello James,
> 
> On Wed, Dec 27, 2017 at 12:50:17PM -0800, James Bottomley wrote:
> > 
> > Reverting these three patches fixes the problem:
> > 
> > commit aa8d22a11da933dbf880b4933b58931f4aefe91c
> > Author: Minchan Kim <minchan@kernel.org>
> > Date:A A A Wed Nov 15 17:33:11 2017 -0800
> > 
> > A A A A mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped
> > page
> > has no other reference
> > 
> > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168
> > Author: Minchan Kim <minchan@kernel.org>
> > Date:A A A Wed Nov 15 17:33:07 2017 -0800
> > 
> > A A A A mm, swap: skip swapcache for swapin of synchronous device
> > 
> > Also need to revert:
> > 
> > commit e9a6effa500526e2a19d5ad042cb758b55b1ef93
> > Author: Huang Ying <huang.ying.caritas@gmail.com>
> > Date:A A A Wed Nov 15 17:33:15 2017 -0800
> > 
> > A A A A mm, swap: fix false error message in __swp_swapcount()
> > 
> > (The latter is simply because it used a function that is eliminated
> > by
> > one of the other reversions). A They came into the merge window via
> > the
> > -mm tree as part of a 4 part series:
> > 
> > Subject:	[PATCH v2 0/4] skip swapcache for super fast device
> > Message-Id:	<1505886205-9671-1-git-send-email-minchan@kernel
> > .org
> > > 
> > > 
> > 
> > James
> 
> Thanks for the report.
> Patches are related to synchronous swap devices like brd, zram,
> nvdimm so
> 
> 1. What swap device do you use among them?

I've reproduced on nvme and sata spinning rust.

> 2. Could you tell me how you can reproduce it?

The way to reproduce is to force something to swap and then get it to
try to touch the page again. A I do this on my systems by using a large
virtual machine, as I said in the email. A There isn't really any
definitive reproduction method beyond that.

James

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Hang with v4.15-rc trying to swap back in
  2017-12-27 23:34     ` James Bottomley
@ 2017-12-27 23:56       ` Minchan Kim
  2017-12-28 17:41         ` James Bottomley
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2017-12-27 23:56 UTC (permalink / raw)
  To: James Bottomley; +Cc: Linux Memory Management List, Andrew Morton

On Wed, Dec 27, 2017 at 03:34:49PM -0800, James Bottomley wrote:
> On Thu, 2017-12-28 at 08:26 +0900, Minchan Kim wrote:
> > Hello James,
> > 
> > On Wed, Dec 27, 2017 at 12:50:17PM -0800, James Bottomley wrote:
> > > 
> > > Reverting these three patches fixes the problem:
> > > 
> > > commit aa8d22a11da933dbf880b4933b58931f4aefe91c
> > > Author: Minchan Kim <minchan@kernel.org>
> > > Date:   Wed Nov 15 17:33:11 2017 -0800
> > > 
> > >     mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped
> > > page
> > > has no other reference
> > > 
> > > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168
> > > Author: Minchan Kim <minchan@kernel.org>
> > > Date:   Wed Nov 15 17:33:07 2017 -0800
> > > 
> > >     mm, swap: skip swapcache for swapin of synchronous device
> > > 
> > > Also need to revert:
> > > 
> > > commit e9a6effa500526e2a19d5ad042cb758b55b1ef93
> > > Author: Huang Ying <huang.ying.caritas@gmail.com>
> > > Date:   Wed Nov 15 17:33:15 2017 -0800
> > > 
> > >     mm, swap: fix false error message in __swp_swapcount()
> > > 
> > > (The latter is simply because it used a function that is eliminated
> > > by
> > > one of the other reversions).  They came into the merge window via
> > > the
> > > -mm tree as part of a 4 part series:
> > > 
> > > Subject:	[PATCH v2 0/4] skip swapcache for super fast device
> > > Message-Id:	<1505886205-9671-1-git-send-email-minchan@kernel
> > > .org
> > > > 
> > > > 
> > > 
> > > James
> > 
> > Thanks for the report.
> > Patches are related to synchronous swap devices like brd, zram,
> > nvdimm so
> > 
> > 1. What swap device do you use among them?
> 
> I've reproduced on nvme and sata spinning rust.
> 
> > 2. Could you tell me how you can reproduce it?
> 
> The way to reproduce is to force something to swap and then get it to
> try to touch the page again.  I do this on my systems by using a large
> virtual machine, as I said in the email.  There isn't really any
> definitive reproduction method beyond that.
> 

Thanks for the information.
It seems I made a bug on do_swap_page. I want to confirm before sending
formal patch.
Could you try on it?

diff --git a/mm/memory.c b/mm/memory.c
index ca5674cbaff2..240521f1322d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2889,9 +2889,12 @@ int do_swap_page(struct vm_fault *vmf)
 
 
 	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
-	if (!page)
+	if (!page) {
 		page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
 					 vmf->address);
+		swapcache = page;
+	}
+
 	if (!page) {
 		struct swap_info_struct *si = swp_swap_info(entry);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Hang with v4.15-rc trying to swap back in
  2017-12-27 23:56       ` Minchan Kim
@ 2017-12-28 17:41         ` James Bottomley
  2017-12-28 19:00           ` James Bottomley
  0 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2017-12-28 17:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Linux Memory Management List, Andrew Morton, Thorsten Leemhuis

On Thu, 2017-12-28 at 08:56 +0900, Minchan Kim wrote:
> On Wed, Dec 27, 2017 at 03:34:49PM -0800, James Bottomley wrote:
> > 
> > On Thu, 2017-12-28 at 08:26 +0900, Minchan Kim wrote:
> > > 
> > > Hello James,
> > > 
> > > On Wed, Dec 27, 2017 at 12:50:17PM -0800, James Bottomley wrote:
> > > > 
> > > > 
> > > > Reverting these three patches fixes the problem:
> > > > 
> > > > commit aa8d22a11da933dbf880b4933b58931f4aefe91c
> > > > Author: Minchan Kim <minchan@kernel.org>
> > > > Date:A A A Wed Nov 15 17:33:11 2017 -0800
> > > > 
> > > > A A A A mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if
> > > > swapped page has no other reference
> > > > 
> > > > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168
> > > > Author: Minchan Kim <minchan@kernel.org>
> > > > Date:A A A Wed Nov 15 17:33:07 2017 -0800
> > > > 
> > > > A A A A mm, swap: skip swapcache for swapin of synchronous device
> > > > 
> > > > Also need to revert:
> > > > 
> > > > commit e9a6effa500526e2a19d5ad042cb758b55b1ef93
> > > > Author: Huang Ying <huang.ying.caritas@gmail.com>
> > > > Date:A A A Wed Nov 15 17:33:15 2017 -0800
> > > > 
> > > > A A A A mm, swap: fix false error message in __swp_swapcount()
> > > > 
> > > > (The latter is simply because it used a function that is
> > > > eliminated by one of the other reversions). A They came into the
> > > > merge window via the -mm tree as part of a 4 part series:
> > > > 
> > > > Subject:	[PATCH v2 0/4] skip swapcache for super fast
> > > > device
> > > > Message-Id:	<1505886205-9671-1-git-send-email-
> > > > minchan@kernel.org
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > James
> > > 
> > > Thanks for the report.
> > > Patches are related to synchronous swap devices like brd, zram,
> > > nvdimm so
> > > 
> > > 1. What swap device do you use among them?
> > 
> > I've reproduced on nvme and sata spinning rust.
> > 
> > > 
> > > 2. Could you tell me how you can reproduce it?
> > 
> > The way to reproduce is to force something to swap and then get it
> > to try to touch the page again. A I do this on my systems by using a
> > large virtual machine, as I said in the email. A There isn't really
> > any definitive reproduction method beyond that.
> > 
> 
> Thanks for the information. It seems I made a bug on do_swap_page. I
> want to confirm before sending formal patch. Could you try on it?
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index ca5674cbaff2..240521f1322d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2889,9 +2889,12 @@ int do_swap_page(struct vm_fault *vmf)
> A 
> A 
> A 	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
> -	if (!page)
> +	if (!page) {
> A 		page = lookup_swap_cache(entry, vma_readahead ? vma
> : NULL,
> A 					A vmf->address);
> +		swapcache = page;
> +	}
> +

This hangs in precisely the same way first kworker then kswapd with the
same stack trace.

I'd guess that since they're both in io_schedule, the problem is that
the io_scheduler is taking far too long servicing the requests due to
some priority issue you've introduced.

Since we're at -rc5, soon to be -rc6, let's just revert the whole
series and you can retry it for 4.16. A The whole point seems to be for
zram, which isn't really a huge use case.

James

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Hang with v4.15-rc trying to swap back in
  2017-12-28 17:41         ` James Bottomley
@ 2017-12-28 19:00           ` James Bottomley
  2017-12-29  0:00             ` Minchan Kim
  2018-01-17 22:33             ` Hugh Dickins
  0 siblings, 2 replies; 11+ messages in thread
From: James Bottomley @ 2017-12-28 19:00 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Linux Memory Management List, Andrew Morton, Thorsten Leemhuis

On Thu, 2017-12-28 at 09:41 -0800, James Bottomley wrote:
> I'd guess that since they're both in io_schedule, the problem is that
> the io_scheduler is taking far too long servicing the requests due to
> some priority issue you've introduced.

OK, so after some analysis, that turned out to be incorrect. A The
problem seems to be that we're exiting do_swap_page() with locked pages
that have been read in from swap.

Your changelogs are entirely unclear on why you changed the swapcache
setting logic in this patch:

commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168
Author: Minchan Kim <minchan@kernel.org>
Date:A A A Wed Nov 15 17:33:07 2017 -0800

A A A A mm, swap: skip swapcache for swapin of synchronous device

But I think you're using swapcache == NULL as a signal the page came
from a synchronous device. A In which case the bug is that you've
forgotten we may already have picked up a page in
swap_readahead_detect() which you're wrongly keeping swapcache == NULL
for and the fix is this (it works on my system, although I'm still
getting an unaccountable shutdown delay).

I still think we should revert this series, because this may not be the
only bug lurking in the code, so it should go through a lot more
rigorous testing than it has.

James

---

diff --git a/mm/memory.c b/mm/memory.c
index ca5674cbaff2..31f9845c340e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2847,7 +2847,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
 int do_swap_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct page *page = NULL, *swapcache = NULL;
+	struct page *page = NULL, *swapcache;
 	struct mem_cgroup *memcg;
 	struct vma_swap_readahead swap_ra;
 	swp_entry_t entry;
@@ -2892,6 +2892,7 @@ int do_swap_page(struct vm_fault *vmf)
 	if (!page)
 		page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
 					 vmf->address);
+	swapcache = page;
 	if (!page) {
 		struct swap_info_struct *si = swp_swap_info(entry);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Hang with v4.15-rc trying to swap back in
  2017-12-28 19:00           ` James Bottomley
@ 2017-12-29  0:00             ` Minchan Kim
  2017-12-29  0:45               ` Minchan Kim
  2018-01-17 22:33             ` Hugh Dickins
  1 sibling, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2017-12-29  0:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: Linux Memory Management List, Andrew Morton, Thorsten Leemhuis

On Thu, Dec 28, 2017 at 11:00:40AM -0800, James Bottomley wrote:
> On Thu, 2017-12-28 at 09:41 -0800, James Bottomley wrote:
> > I'd guess that since they're both in io_schedule, the problem is that
> > the io_scheduler is taking far too long servicing the requests due to
> > some priority issue you've introduced.
> 
> OK, so after some analysis, that turned out to be incorrect.  The
> problem seems to be that we're exiting do_swap_page() with locked pages
> that have been read in from swap.
> 
> Your changelogs are entirely unclear on why you changed the swapcache
> setting logic in this patch:
> 
> commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168
> Author: Minchan Kim <minchan@kernel.org>
> Date:   Wed Nov 15 17:33:07 2017 -0800
> 
>     mm, swap: skip swapcache for swapin of synchronous device
> 
> But I think you're using swapcache == NULL as a signal the page came
> from a synchronous device.  In which case the bug is that you've

Exactly. Because the patchset aims for skipping swap cache for synchronous
device and some logics of do_swap_page has has assumed the page is on
swap cache.

> forgotten we may already have picked up a page in
> swap_readahead_detect() which you're wrongly keeping swapcache == NULL
> for and the fix is this (it works on my system, although I'm still
> getting an unaccountable shutdown delay).

SIGH. I missed that.

> 
> I still think we should revert this series, because this may not be the
> only bug lurking in the code, so it should go through a lot more
> rigorous testing than it has.

I have no problem. It's not urgent.

Andrew, this is reverting patch based on 4.15-rc5. And I need to send
another revert patch against on mmotm because it would have conflict due to
vma-based readahead restructuring patch. I will send soon.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Hang with v4.15-rc trying to swap back in
  2017-12-29  0:00             ` Minchan Kim
@ 2017-12-29  0:45               ` Minchan Kim
  0 siblings, 0 replies; 11+ messages in thread
From: Minchan Kim @ 2017-12-29  0:45 UTC (permalink / raw)
  To: James Bottomley
  Cc: Linux Memory Management List, Andrew Morton, Thorsten Leemhuis

On Fri, Dec 29, 2017 at 09:00:16AM +0900, Minchan Kim wrote:
> On Thu, Dec 28, 2017 at 11:00:40AM -0800, James Bottomley wrote:
> > On Thu, 2017-12-28 at 09:41 -0800, James Bottomley wrote:
> > > I'd guess that since they're both in io_schedule, the problem is that
> > > the io_scheduler is taking far too long servicing the requests due to
> > > some priority issue you've introduced.
> > 
> > OK, so after some analysis, that turned out to be incorrect.  The
> > problem seems to be that we're exiting do_swap_page() with locked pages
> > that have been read in from swap.
> > 
> > Your changelogs are entirely unclear on why you changed the swapcache
> > setting logic in this patch:
> > 
> > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168
> > Author: Minchan Kim <minchan@kernel.org>
> > Date:   Wed Nov 15 17:33:07 2017 -0800
> > 
> >     mm, swap: skip swapcache for swapin of synchronous device
> > 
> > But I think you're using swapcache == NULL as a signal the page came
> > from a synchronous device.  In which case the bug is that you've
> 
> Exactly. Because the patchset aims for skipping swap cache for synchronous
> device and some logics of do_swap_page has has assumed the page is on
> swap cache.
> 
> > forgotten we may already have picked up a page in
> > swap_readahead_detect() which you're wrongly keeping swapcache == NULL
> > for and the fix is this (it works on my system, although I'm still
> > getting an unaccountable shutdown delay).
> 
> SIGH. I missed that.
> 
> > 
> > I still think we should revert this series, because this may not be the
> > only bug lurking in the code, so it should go through a lot more
> > rigorous testing than it has.
> 
> I have no problem. It's not urgent.
> 
> Andrew, this is reverting patch based on 4.15-rc5. And I need to send
> another revert patch against on mmotm because it would have conflict due to
> vma-based readahead restructuring patch. I will send soon.
> 

We should revert (23c47d2ada9f, bdi: introduce BDI_CAP_SYNCHRONOUS_IO) as well.
I resend reverting patch with including 23c47d2ada9f.

Andrew, Pleas take this patch for linus-tree. Sorry for the confusion.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Hang with v4.15-rc trying to swap back in
  2017-12-28 19:00           ` James Bottomley
  2017-12-29  0:00             ` Minchan Kim
@ 2018-01-17 22:33             ` Hugh Dickins
  2018-01-17 22:58               ` Andrew Morton
  1 sibling, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2018-01-17 22:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, James Bottomley, Linux Memory Management List,
	Thorsten Leemhuis

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2413 bytes --]

On Thu, 28 Dec 2017, James Bottomley wrote:
> On Thu, 2017-12-28 at 09:41 -0800, James Bottomley wrote:
> > I'd guess that since they're both in io_schedule, the problem is that
> > the io_scheduler is taking far too long servicing the requests due to
> > some priority issue you've introduced.
> 
> OK, so after some analysis, that turned out to be incorrect.  The
> problem seems to be that we're exiting do_swap_page() with locked pages
> that have been read in from swap.
> 
> Your changelogs are entirely unclear on why you changed the swapcache
> setting logic in this patch:
> 
> commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168
> Author: Minchan Kim <minchan@kernel.org>
> Date:   Wed Nov 15 17:33:07 2017 -0800
> 
>     mm, swap: skip swapcache for swapin of synchronous device
> 
> But I think you're using swapcache == NULL as a signal the page came
> from a synchronous device.  In which case the bug is that you've
> forgotten we may already have picked up a page in
> swap_readahead_detect() which you're wrongly keeping swapcache == NULL
> for and the fix is this (it works on my system, although I'm still
> getting an unaccountable shutdown delay).
> 
> I still think we should revert this series, because this may not be the
> only bug lurking in the code, so it should go through a lot more
> rigorous testing than it has.

Andrew, neither the fix below (works for me, though I have seen other
swap funniness, most probably unrelated), nor the reversion preferred
by James and Minchan (later in this linux-mm thread), was in 4.15-rc8:
the sands of time are running out...

Hugh

> 
> James
> 
> ---
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index ca5674cbaff2..31f9845c340e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2847,7 +2847,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
>  int do_swap_page(struct vm_fault *vmf)
>  {
>  	struct vm_area_struct *vma = vmf->vma;
> -	struct page *page = NULL, *swapcache = NULL;
> +	struct page *page = NULL, *swapcache;
>  	struct mem_cgroup *memcg;
>  	struct vma_swap_readahead swap_ra;
>  	swp_entry_t entry;
> @@ -2892,6 +2892,7 @@ int do_swap_page(struct vm_fault *vmf)
>  	if (!page)
>  		page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
>  					 vmf->address);
> +	swapcache = page;
>  	if (!page) {
>  		struct swap_info_struct *si = swp_swap_info(entry);
>  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Hang with v4.15-rc trying to swap back in
  2018-01-17 22:33             ` Hugh Dickins
@ 2018-01-17 22:58               ` Andrew Morton
  2018-01-17 23:15                 ` Hugh Dickins
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2018-01-17 22:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Minchan Kim, James Bottomley, Linux Memory Management List,
	Thorsten Leemhuis

On Wed, 17 Jan 2018 14:33:21 -0800 (PST) Hugh Dickins <hughd@google.com> wrote:

> On Thu, 28 Dec 2017, James Bottomley wrote:
> > On Thu, 2017-12-28 at 09:41 -0800, James Bottomley wrote:
> > > I'd guess that since they're both in io_schedule, the problem is that
> > > the io_scheduler is taking far too long servicing the requests due to
> > > some priority issue you've introduced.
> > 
> > OK, so after some analysis, that turned out to be incorrect.  The
> > problem seems to be that we're exiting do_swap_page() with locked pages
> > that have been read in from swap.
> > 
> > Your changelogs are entirely unclear on why you changed the swapcache
> > setting logic in this patch:
> > 
> > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168
> > Author: Minchan Kim <minchan@kernel.org>
> > Date:   Wed Nov 15 17:33:07 2017 -0800
> > 
> >     mm, swap: skip swapcache for swapin of synchronous device
> > 
> > But I think you're using swapcache == NULL as a signal the page came
> > from a synchronous device.  In which case the bug is that you've
> > forgotten we may already have picked up a page in
> > swap_readahead_detect() which you're wrongly keeping swapcache == NULL
> > for and the fix is this (it works on my system, although I'm still
> > getting an unaccountable shutdown delay).
> > 
> > I still think we should revert this series, because this may not be the
> > only bug lurking in the code, so it should go through a lot more
> > rigorous testing than it has.
> 
> Andrew, neither the fix below (works for me, though I have seen other
> swap funniness, most probably unrelated), nor the reversion preferred
> by James and Minchan (later in this linux-mm thread), was in 4.15-rc8:
> the sands of time are running out...

Yup.  I'm actually planning on sending in this one.  OK by you?


From: Minchan Kim <minchan@kernel.org>
Subject: mm/memory.c: release locked page in do_swap_page()

James reported a bug in swap paging-in from his testing.  It is that
do_swap_page doesn't release locked page so system hang-up happens due to
a deadlock on PG_locked.

It was introduced by 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of
synchronous device") because I missed swap cache hit places to update
swapcache variable to work well with other logics against swapcache in
do_swap_page.

This patch fixes it.

Debugged by James Bottomley.

Link: http://lkml.kernel.org/r/<1514407817.4169.4.camel@HansenPartnership.com>
Link: http://lkml.kernel.org/r/20180102235606.GA19438@bbox
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff -puN mm/memory.c~mm-release-locked-page-in-do_swap_page mm/memory.c
--- a/mm/memory.c~mm-release-locked-page-in-do_swap_page
+++ a/mm/memory.c
@@ -2857,8 +2857,11 @@ int do_swap_page(struct vm_fault *vmf)
 	int ret = 0;
 	bool vma_readahead = swap_use_vma_readahead();
 
-	if (vma_readahead)
+	if (vma_readahead) {
 		page = swap_readahead_detect(vmf, &swap_ra);
+		swapcache = page;
+	}
+
 	if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) {
 		if (page)
 			put_page(page);
@@ -2889,9 +2892,12 @@ int do_swap_page(struct vm_fault *vmf)
 
 
 	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
-	if (!page)
+	if (!page) {
 		page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
 					 vmf->address);
+		swapcache = page;
+	}
+
 	if (!page) {
 		struct swap_info_struct *si = swp_swap_info(entry);
 
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Hang with v4.15-rc trying to swap back in
  2018-01-17 22:58               ` Andrew Morton
@ 2018-01-17 23:15                 ` Hugh Dickins
  0 siblings, 0 replies; 11+ messages in thread
From: Hugh Dickins @ 2018-01-17 23:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Minchan Kim, James Bottomley,
	Linux Memory Management List, Thorsten Leemhuis

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4163 bytes --]

On Wed, 17 Jan 2018, Andrew Morton wrote:
> On Wed, 17 Jan 2018 14:33:21 -0800 (PST) Hugh Dickins <hughd@google.com> wrote:
> > On Thu, 28 Dec 2017, James Bottomley wrote:
> > > On Thu, 2017-12-28 at 09:41 -0800, James Bottomley wrote:
> > > > I'd guess that since they're both in io_schedule, the problem is that
> > > > the io_scheduler is taking far too long servicing the requests due to
> > > > some priority issue you've introduced.
> > > 
> > > OK, so after some analysis, that turned out to be incorrect.  The
> > > problem seems to be that we're exiting do_swap_page() with locked pages
> > > that have been read in from swap.
> > > 
> > > Your changelogs are entirely unclear on why you changed the swapcache
> > > setting logic in this patch:
> > > 
> > > commit 0bcac06f27d7528591c27ac2b093ccd71c5d0168
> > > Author: Minchan Kim <minchan@kernel.org>
> > > Date:   Wed Nov 15 17:33:07 2017 -0800
> > > 
> > >     mm, swap: skip swapcache for swapin of synchronous device
> > > 
> > > But I think you're using swapcache == NULL as a signal the page came
> > > from a synchronous device.  In which case the bug is that you've
> > > forgotten we may already have picked up a page in
> > > swap_readahead_detect() which you're wrongly keeping swapcache == NULL
> > > for and the fix is this (it works on my system, although I'm still
> > > getting an unaccountable shutdown delay).
> > > 
> > > I still think we should revert this series, because this may not be the
> > > only bug lurking in the code, so it should go through a lot more
> > > rigorous testing than it has.
> > 
> > Andrew, neither the fix below (works for me, though I have seen other
> > swap funniness, most probably unrelated), nor the reversion preferred
> > by James and Minchan (later in this linux-mm thread), was in 4.15-rc8:
> > the sands of time are running out...
> 
> Yup.  I'm actually planning on sending in this one.  OK by you?

Thanks, yes, that looks equivalent to what I've been running with.

> 
> 
> From: Minchan Kim <minchan@kernel.org>
> Subject: mm/memory.c: release locked page in do_swap_page()
> 
> James reported a bug in swap paging-in from his testing.  It is that
> do_swap_page doesn't release locked page so system hang-up happens due to
> a deadlock on PG_locked.
> 
> It was introduced by 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of
> synchronous device") because I missed swap cache hit places to update
> swapcache variable to work well with other logics against swapcache in
> do_swap_page.
> 
> This patch fixes it.
> 
> Debugged by James Bottomley.
> 
> Link: http://lkml.kernel.org/r/<1514407817.4169.4.camel@HansenPartnership.com>
> Link: http://lkml.kernel.org/r/20180102235606.GA19438@bbox
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> Reported-by: James Bottomley <James.Bottomley@hansenpartnership.com>
> Cc: Hugh Dickins <hughd@google.com>

Acked-by: Hugh Dickins <hughd@google.com>

> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/memory.c |   10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff -puN mm/memory.c~mm-release-locked-page-in-do_swap_page mm/memory.c
> --- a/mm/memory.c~mm-release-locked-page-in-do_swap_page
> +++ a/mm/memory.c
> @@ -2857,8 +2857,11 @@ int do_swap_page(struct vm_fault *vmf)
>  	int ret = 0;
>  	bool vma_readahead = swap_use_vma_readahead();
>  
> -	if (vma_readahead)
> +	if (vma_readahead) {
>  		page = swap_readahead_detect(vmf, &swap_ra);
> +		swapcache = page;
> +	}
> +
>  	if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) {
>  		if (page)
>  			put_page(page);
> @@ -2889,9 +2892,12 @@ int do_swap_page(struct vm_fault *vmf)
>  
>  
>  	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
> -	if (!page)
> +	if (!page) {
>  		page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
>  					 vmf->address);
> +		swapcache = page;
> +	}
> +
>  	if (!page) {
>  		struct swap_info_struct *si = swp_swap_info(entry);
>  
> _

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-01-17 23:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1514398340.3986.10.camel@HansenPartnership.com>
2017-12-27 20:50 ` Hang with v4.15-rc trying to swap back in James Bottomley
2017-12-27 23:26   ` Minchan Kim
2017-12-27 23:34     ` James Bottomley
2017-12-27 23:56       ` Minchan Kim
2017-12-28 17:41         ` James Bottomley
2017-12-28 19:00           ` James Bottomley
2017-12-29  0:00             ` Minchan Kim
2017-12-29  0:45               ` Minchan Kim
2018-01-17 22:33             ` Hugh Dickins
2018-01-17 22:58               ` Andrew Morton
2018-01-17 23:15                 ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox