linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: kosaki.motohiro@jp.fujitsu.com,
	Andrew Morton <akpm@linux-foundation.org>,
	stable@kernel.org, Rik van Riel <riel@redhat.com>,
	Mel Gorman <mel@csn.ul.ie>, Christoph Hellwig <hch@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Nick Piggin <npiggin@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Minchan Kim <minchan.kim@gmail.com>, Andreas Mohr <andi@lisas.de>,
	Bill Davidsen <davidsen@tmr.com>,
	Ben Gamari <bgamari.foss@gmail.com>
Subject: Why PAGEOUT_IO_SYNC stalls for a long time
Date: Wed, 28 Jul 2010 20:40:21 +0900 (JST)	[thread overview]
Message-ID: <20100728191322.4A85.A69D9226@jp.fujitsu.com> (raw)
In-Reply-To: <20100728071705.GA22964@localhost>

In this week, I've tested some IO congested workload for a while. and probably
I did reproduced Andreas's issue.

So, I would like to explain current lumpy reclaim how works and why so much sucks.


1. Now isolate_lru_pages() have following pfn neighber grabbing logic.

                for (; pfn < end_pfn; pfn++) {
(snip)
                        if (__isolate_lru_page(cursor_page, mode, file) == 0) {
                                list_move(&cursor_page->lru, dst);
                                mem_cgroup_del_lru(cursor_page);
                                nr_taken++;
                                nr_lumpy_taken++;
                                if (PageDirty(cursor_page))
                                        nr_lumpy_dirty++;
                                scan++;
                        } else {
                                if (mode == ISOLATE_BOTH &&
                                                page_count(cursor_page))
                                        nr_lumpy_failed++;
                        }
                }

Mainly, __isolate_lru_page() failure can be caused following reasons.
  (1) the page have already been freed and is in buddy.
  (2) the page is used for non user process purpose
  (3) the page is unevictable (e.g. mlocked)

(2), (3) have very different characteristic from (1). the lumpy reclaim
mean 'contenious physical memory reclaiming'. that said, if we are trying
order 9 reclaim, 512 pages reclaim success and 511 pages reclaim success
are completely differennt. former mean lumpy reclaim successfull, latter mean
failure. So, if (2) or (3) occur, that pfn have lost a possibility of lumpy
reclaim successfull. then, we should stop pfn neighbor search immediately and
try to get lru next page. (i.e. we should use 'break' statement instead 'continue')

2. synchronous lumpy reclaim condition is insane.

currently, synchrounous lumpy reclaim will be invoked when following
condition.

        if (nr_reclaimed < nr_taken && !current_is_kswapd() &&
                        sc->lumpy_reclaim_mode) {

but "nr_reclaimed < nr_taken" is pretty stupid. if isolated pages have
much dirty pages, pageout() only issue first 113 IOs.
(if io queue have >113 requests, bdi_write_congested() return true and
 may_write_to_queue() return false)

So, we haven't call ->writepage(), congestion_wait() and wait_on_page_writeback()
are surely stupid.


3. pageout() is intended anynchronous api. but doesn't works so.

pageout() call ->writepage with wbc->nonblocking=1. because if the system have
default vm.dirty_ratio (i.e. 20), we have 80% clean memory. so, getting stuck
on one page is stupid, we should scan much pages as soon as possible.

HOWEVER, block layer ignore this argument. if slow usb memory device connect
to the system, ->writepage() will sleep long time. because submit_bio() call
get_request_wait() unconditionally and it doesn't have any PF_MEMALLOC task
bonus.


4. synchronous lumpy reclaim call clear_active_flags(). but it is also silly.

Now, page_check_references() ignore pte young bit when we are processing lumpy reclaim.
Then, In almostly case, PageActive() mean "swap device is full". Therefore,
waiting IO and retry pageout() are just silly.


In andres's case, congestion_wait() and get_request_wait() are root cause.
Other issue is problematic when more higher order lumpy reclaim.


Now, I'm preparing some patches and probably I can send them tommorow.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-07-28 11:40 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-28  7:17 [PATCH] vmscan: raise the bar to PAGEOUT_IO_SYNC stalls Wu Fengguang
2010-07-28  7:49 ` Minchan Kim
2010-07-28  8:46   ` [PATCH] vmscan: remove wait_on_page_writeback() from pageout() Wu Fengguang
2010-07-28  9:10     ` Mel Gorman
2010-07-28  9:30       ` Wu Fengguang
2010-07-28  9:45         ` Mel Gorman
2010-07-28  9:43       ` KOSAKI Motohiro
2010-07-28  9:50         ` Mel Gorman
2010-07-28  9:59           ` KOSAKI Motohiro
2010-08-01  5:27             ` Wu Fengguang
2010-08-01  5:49               ` Wu Fengguang
2010-08-01  8:32               ` KOSAKI Motohiro
2010-08-01  8:35                 ` Wu Fengguang
2010-08-01  8:40                   ` KOSAKI Motohiro
2010-08-01  5:17         ` Wu Fengguang
2010-07-28 16:29     ` Minchan Kim
2010-07-28 11:40 ` KOSAKI Motohiro [this message]
2010-07-28 13:10   ` Why PAGEOUT_IO_SYNC stalls for a long time Mel Gorman
2010-07-29 10:34     ` KOSAKI Motohiro
2010-07-29 14:24       ` Mel Gorman
2010-07-30  4:54         ` KOSAKI Motohiro
2010-07-30 10:30           ` Mel Gorman
2010-08-01  8:47             ` KOSAKI Motohiro
2010-08-04 11:10               ` Mel Gorman
2010-08-05  6:20                 ` KOSAKI Motohiro
2010-08-05  8:09                   ` Andreas Mohr
2010-07-28 17:30   ` Andrew Morton
2010-07-29  1:01     ` KOSAKI Motohiro
2010-07-30 13:17 ` [PATCH] vmscan: raise the bar to PAGEOUT_IO_SYNC stalls Andrea Arcangeli
2010-07-30 13:31   ` Mel Gorman
2010-07-31 16:13 ` Wu Fengguang
2010-07-31 17:33   ` Christoph Hellwig
2010-07-31 17:55     ` Pekka Enberg
2010-07-31 17:59       ` Christoph Hellwig
2010-07-31 18:09         ` Pekka Enberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100728191322.4A85.A69D9226@jp.fujitsu.com \
    --to=kosaki.motohiro@jp.fujitsu.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@lisas.de \
    --cc=bgamari.foss@gmail.com \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=davidsen@tmr.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=minchan.kim@gmail.com \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    --cc=stable@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox