From: Hugh Dickins <hughd@google.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org
Subject: [PATCH] mm: compaction beware writeback
Date: Sat, 19 Mar 2011 23:27:38 -0700 (PDT) [thread overview]
Message-ID: <alpine.LSU.2.00.1103192318100.1877@sister.anvils> (raw)
I notice there's a Bug 31142 "Large write to USB stick freezes"
discussion happening (which I've not digested), for which Andrea
is proposing a patch which reminds me of this one. Thought I'd
better throw this into the mix for consideration.
I'd not sent it in yet, because I only see the problem on one machine,
and then only with a shmem patch I'm working up; but can't see how
that patch would actually be necessary to create the problem.
It happens in my extfs-on-loop-on-tmpfs swapping tests, when copying
in the kernel tree. I believe the relevant traces are these three:
I notice sync_supers there every time it hangs, but I guess it comes
along after, and gets stuck on the same page which cp is waiting for.
D sync_supers:
schedule +0x670
io_schedule +0x50
sync_buffer +0x68
__wait_on_bit +0x90
out_of_line_wait_on_bit +0x98
__wait_on_buffer +0x30
__sync_dirty_buffer +0xc0
ext4_commit_super +0x2c4
ext4_write_super +0x28
sync_supers +0xdc
bdi_sync_supers +0x40
kthread +0xac
kernel_thread +0x54
D loop0:
schedule +0x670
io_schedule +0x50
sync_page +0x84
__wait_on_bit +0x90
wait_on_page_bit +0xa4
unmap_and_move +0x180
migrate_pages +0xbc
compact_zone +0xbc
compact_zone_order +0xc8
try_to_compact_pages +0x104
__alloc_pages_direct_compact +0xc0
__alloc_pages_nodemask +0x68c
allocate_slab +0x84
new_slab +0x58
__slab_alloc +0x1ec
kmem_cache_alloc +0x7c
radix_tree_preload +0x94
add_to_page_cache_locked +0x78
shmem_getpage +0x208
pagecache_write_begin +0x2c
do_lo_send_aops +0xc0
do_bio_filebacked +0x11c
loop_thread +0x204
kthread +0xac
kernel_thread +0x54
D cp:
schedule +0x670
io_schedule +0x50
sync_buffer +0x68
__wait_on_bit +0x90
out_of_line_wait_on_bit +0x98
__wait_on_buffer +0x30
ext4_find_entry +0x230
ext4_lookup +0x44
d_alloc_and_lookup +0x74
do_last +0xe0
do_filp_open +0x2b8
do_sys_open +0x8c
compat_sys_open +0x24
syscall_exit +0x0
I believe (but haven't verified for sure) that what happens is that
compaction (when trying to allocate a radix_tree node - SLUB asks
for order 2 - in the loop0 daemon trace) chooses the cp page under
writeback which is waiting for loop0 to write it.
So I've extended your earlier PF_MEMALLOC patch to prevent waiting for
writeback as well as waiting for pagelock. And I've never seen the
hang again since putting this patch in.
Signed-off-by: Hugh Dickins <hughd@google.com>
---
mm/migrate.c | 38 +++++++++++++++++++++-----------------
1 file changed, 21 insertions(+), 17 deletions(-)
--- 2.6.38/mm/migrate.c 2011-03-14 18:20:32.000000000 -0700
+++ linux/mm/migrate.c 2011-03-15 06:36:26.000000000 -0700
@@ -637,29 +637,33 @@ static int unmap_and_move(new_page_t get
if (unlikely(split_huge_page(page)))
goto move_newpage;
+ /*
+ * It's not safe for direct compaction to call lock_page.
+ * For example, during page readahead pages are added locked
+ * to the LRU. Later, when the IO completes the pages are
+ * marked uptodate and unlocked. However, the queueing
+ * could be merging multiple pages for one bio (e.g.
+ * mpage_readpages). If an allocation happens for the
+ * second or third page, the process can end up locking
+ * the same page twice and deadlocking. Rather than
+ * trying to be clever about what pages can be locked,
+ * avoid the use of lock_page for direct compaction
+ * altogether.
+ *
+ * Nor is it safe for direct compaction to wait_on_page_writeback:
+ * we might be trying to allocate on behalf of that writeback (e.g.
+ * slub allocating an order-2 page for a radix_tree node for the
+ * loop device below, might target that very page under writeback).
+ */
+ if (current->flags & PF_MEMALLOC)
+ force = 0;
+
/* prepare cgroup just returns 0 or -ENOMEM */
rc = -EAGAIN;
if (!trylock_page(page)) {
if (!force)
goto move_newpage;
-
- /*
- * It's not safe for direct compaction to call lock_page.
- * For example, during page readahead pages are added locked
- * to the LRU. Later, when the IO completes the pages are
- * marked uptodate and unlocked. However, the queueing
- * could be merging multiple pages for one bio (e.g.
- * mpage_readpages). If an allocation happens for the
- * second or third page, the process can end up locking
- * the same page twice and deadlocking. Rather than
- * trying to be clever about what pages can be locked,
- * avoid the use of lock_page for direct compaction
- * altogether.
- */
- if (current->flags & PF_MEMALLOC)
- goto move_newpage;
-
lock_page(page);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2011-03-20 6:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-20 6:27 Hugh Dickins [this message]
2011-03-20 17:47 ` Andrea Arcangeli
2011-03-21 2:37 ` Hugh Dickins
2011-03-21 12:32 ` Andrea Arcangeli
2011-03-21 9:59 ` Mel Gorman
2011-03-29 8:27 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.00.1103192318100.1877@sister.anvils \
--to=hughd@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox