[PATCH] mm: compaction beware writeback

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Hugh Dickins <hughd@google.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org
Subject: [PATCH] mm: compaction beware writeback
Date: Sat, 19 Mar 2011 23:27:38 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.00.1103192318100.1877@sister.anvils> (raw)

I notice there's a Bug 31142 "Large write to USB stick freezes"
discussion happening (which I've not digested), for which Andrea
is proposing a patch which reminds me of this one.  Thought I'd
better throw this into the mix for consideration.

I'd not sent it in yet, because I only see the problem on one machine,
and then only with a shmem patch I'm working up; but can't see how
that patch would actually be necessary to create the problem.

It happens in my extfs-on-loop-on-tmpfs swapping tests, when copying
in the kernel tree.  I believe the relevant traces are these three:
I notice sync_supers there every time it hangs, but I guess it comes
along after, and gets stuck on the same page which cp is waiting for.

D  sync_supers:
schedule +0x670
io_schedule +0x50
sync_buffer +0x68
__wait_on_bit +0x90
out_of_line_wait_on_bit +0x98
__wait_on_buffer +0x30
__sync_dirty_buffer +0xc0
ext4_commit_super +0x2c4
ext4_write_super +0x28
sync_supers +0xdc
bdi_sync_supers +0x40
kthread +0xac
kernel_thread +0x54

D  loop0:
schedule +0x670
io_schedule +0x50
sync_page +0x84
__wait_on_bit +0x90
wait_on_page_bit +0xa4
unmap_and_move +0x180
migrate_pages +0xbc
compact_zone +0xbc
compact_zone_order +0xc8
try_to_compact_pages +0x104
__alloc_pages_direct_compact +0xc0
__alloc_pages_nodemask +0x68c
allocate_slab +0x84
new_slab +0x58
__slab_alloc +0x1ec
kmem_cache_alloc +0x7c
radix_tree_preload +0x94
add_to_page_cache_locked +0x78
shmem_getpage +0x208
pagecache_write_begin +0x2c
do_lo_send_aops +0xc0
do_bio_filebacked +0x11c
loop_thread +0x204
kthread +0xac
kernel_thread +0x54

D  cp:
schedule +0x670
io_schedule +0x50
sync_buffer +0x68
__wait_on_bit +0x90
out_of_line_wait_on_bit +0x98
__wait_on_buffer +0x30
ext4_find_entry +0x230
ext4_lookup +0x44
d_alloc_and_lookup +0x74
do_last +0xe0
do_filp_open +0x2b8
do_sys_open +0x8c
compat_sys_open +0x24
syscall_exit +0x0

I believe (but haven't verified for sure) that what happens is that
compaction (when trying to allocate a radix_tree node - SLUB asks
for order 2 - in the loop0 daemon trace) chooses the cp page under
writeback which is waiting for loop0 to write it.

So I've extended your earlier PF_MEMALLOC patch to prevent waiting for
writeback as well as waiting for pagelock.  And I've never seen the
hang again since putting this patch in.

Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/migrate.c |   38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

--- 2.6.38/mm/migrate.c	2011-03-14 18:20:32.000000000 -0700
+++ linux/mm/migrate.c	2011-03-15 06:36:26.000000000 -0700
@@ -637,29 +637,33 @@ static int unmap_and_move(new_page_t get
 		if (unlikely(split_huge_page(page)))
 			goto move_newpage;
 
+	/*
+	 * It's not safe for direct compaction to call lock_page.
+	 * For example, during page readahead pages are added locked
+	 * to the LRU. Later, when the IO completes the pages are
+	 * marked uptodate and unlocked. However, the queueing
+	 * could be merging multiple pages for one bio (e.g.
+	 * mpage_readpages). If an allocation happens for the
+	 * second or third page, the process can end up locking
+	 * the same page twice and deadlocking. Rather than
+	 * trying to be clever about what pages can be locked,
+	 * avoid the use of lock_page for direct compaction
+	 * altogether.
+	 *
+	 * Nor is it safe for direct compaction to wait_on_page_writeback:
+	 * we might be trying to allocate on behalf of that writeback (e.g.
+	 * slub allocating an order-2 page for a radix_tree node for the
+	 * loop device below, might target that very page under writeback).
+	 */
+	if (current->flags & PF_MEMALLOC)
+		force = 0;
+
 	/* prepare cgroup just returns 0 or -ENOMEM */
 	rc = -EAGAIN;
 
 	if (!trylock_page(page)) {
 		if (!force)
 			goto move_newpage;
-
-		/*
-		 * It's not safe for direct compaction to call lock_page.
-		 * For example, during page readahead pages are added locked
-		 * to the LRU. Later, when the IO completes the pages are
-		 * marked uptodate and unlocked. However, the queueing
-		 * could be merging multiple pages for one bio (e.g.
-		 * mpage_readpages). If an allocation happens for the
-		 * second or third page, the process can end up locking
-		 * the same page twice and deadlocking. Rather than
-		 * trying to be clever about what pages can be locked,
-		 * avoid the use of lock_page for direct compaction
-		 * altogether.
-		 */
-		if (current->flags & PF_MEMALLOC)
-			goto move_newpage;
-
 		lock_page(page);
 	}
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next             reply	other threads:[~2011-03-20  6:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-20  6:27 Hugh Dickins [this message]
2011-03-20 17:47 ` Andrea Arcangeli
2011-03-21  2:37   ` Hugh Dickins
2011-03-21 12:32     ` Andrea Arcangeli
2011-03-21  9:59 ` Mel Gorman
2011-03-29  8:27 ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.00.1103192318100.1877@sister.anvils \
    --to=hughd@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox