linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>,
	linux-mm@kvack.org, Nick Piggin <npiggin@suse.de>,
	Chris Mason <chris.mason@oracle.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure
Date: Mon, 15 Mar 2010 12:29:48 +0000	[thread overview]
Message-ID: <20100315122948.GJ18274@csn.ul.ie> (raw)
In-Reply-To: <20100312093755.b2393b33.akpm@linux-foundation.org>

On Fri, Mar 12, 2010 at 09:37:55AM -0500, Andrew Morton wrote:
> On Fri, 12 Mar 2010 13:15:05 +0100 Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> wrote:
> 
> > > It still feels a bit unnatural though that the page allocator waits on
> > > congestion when what it really cares about is watermarks. Even if this
> > > patch works for Christian, I think it still has merit so will kick it a
> > > few more times.
> > 
> > In whatever way I can look at it watermark_wait should be supperior to 
> > congestion_wait. Because as Mel points out waiting for watermarks is 
> > what is semantically correct there.
> 
> If a direct-reclaimer waits for some thresholds to be achieved then what
> task is doing reclaim?
> 
> Ultimately, kswapd. 

Well, not quite. The direct reclaimer will still wake up after a timeout
and try again regardless of whether watermarks have been met or not. The
intention is to back after after direct reclaim has failed. Granted, the
window during which a direct reclaim finishes and an allocation attempt
occurs is unnecessarily large. This may be addressed by the patch that
changes where cond_resched() is called.

> This will introduce a hard dependency upon kswapd
> activity.  This might introduce scalability problems.  And latency
> problems if kswapd if off doodling with a slow device (say), or doing a
> journal commit.  And perhaps deadlocks if kswapd tries to take a lock
> which one of the waiting-for-watermark direct relcaimers holds.
> 

What lock could they be holding? Even if that is the case, the direct
reclaimers do not wait indefinitily.

> Generally, kswapd is an optional, best-effort latency optimisation
> thing and we haven't designed for it to be a critical service. 
> Probably stuff would break were we to do so.
> 

No disagreements there.

> This is one of the reasons why we avoided creating such dependencies in
> reclaim.  Instead, what we do when a reclaimer is encountering lots of
> dirty or in-flight pages is
> 
> 	msleep(100);
> 
> then try again.  We're waiting for the disks, not kswapd.
> 
> Only the hard-wired 100 is a bit silly, so we made the "100" variable,
> inversely dependent upon the number of disks and their speed.  If you
> have more and faster disks then you sleep for less time.
> 
> And that's what congestion_wait() does, in a very simplistic fashion. 
> It's a facility which direct-reclaimers use to ratelimit themselves in
> inverse proportion to the speed with which the system can retire writes.
> 

The problem being hit is when a direct reclaimer goes to sleep waiting
on congestion when in reality there were not lots of dirty or in-flight
pages. It goes to sleep for the wrong reasons and doesn't get woken up
again until the timeout expires.

Bear in mind that even if congestion clears, it just means that dirty
pages are now clean although I admit that the next direct reclaim it
does is going to encounter clean pages and should succeed.

Lets see how the other patch that changes when cond_reched() gets called
gets on. If it also works out, then it's harder to justify this patch.
If it doesn't work out then it'll need to be kicked another few times.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-03-15 12:30 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-08 11:48 Mel Gorman
2010-03-08 11:48 ` [PATCH 1/3] page-allocator: Under memory pressure, wait on pressure to relieve instead of congestion Mel Gorman
2010-03-09 13:35   ` Nick Piggin
2010-03-09 14:17     ` Mel Gorman
2010-03-09 15:03       ` Nick Piggin
2010-03-09 15:42         ` Christian Ehrhardt
2010-03-09 18:22           ` Mel Gorman
2010-03-10  2:38             ` Nick Piggin
2010-03-09 17:35         ` Mel Gorman
2010-03-10  2:35           ` Nick Piggin
2010-03-09 15:50   ` Christoph Lameter
2010-03-09 15:56     ` Christian Ehrhardt
2010-03-09 16:09       ` Christoph Lameter
2010-03-09 17:01         ` Mel Gorman
2010-03-09 17:11           ` Christoph Lameter
2010-03-09 17:30             ` Mel Gorman
2010-03-08 11:48 ` [PATCH 2/3] page-allocator: Check zone pressure when batch of pages are freed Mel Gorman
2010-03-09  9:53   ` Nick Piggin
2010-03-09 10:08     ` Mel Gorman
2010-03-09 10:23       ` Nick Piggin
2010-03-09 10:36         ` Mel Gorman
2010-03-09 11:11           ` Nick Piggin
2010-03-09 11:29             ` Mel Gorman
2010-03-08 11:48 ` [PATCH 3/3] vmscan: Put kswapd to sleep on its own waitqueue, not congestion Mel Gorman
2010-03-09 10:00   ` Nick Piggin
2010-03-09 10:21     ` Mel Gorman
2010-03-09 10:32       ` Nick Piggin
2010-03-11 23:41 ` [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure Andrew Morton
2010-03-12  6:39   ` Christian Ehrhardt
2010-03-12  7:05     ` Andrew Morton
2010-03-12 10:47       ` Mel Gorman
2010-03-12 12:15         ` Christian Ehrhardt
2010-03-12 14:37           ` Andrew Morton
2010-03-15 12:29             ` Mel Gorman [this message]
2010-03-15 14:45               ` Christian Ehrhardt
2010-03-15 12:34             ` Christian Ehrhardt
2010-03-15 20:09               ` Andrew Morton
2010-03-16 10:11                 ` Mel Gorman
2010-03-18 17:42                 ` Mel Gorman
2010-03-22 23:50                 ` Mel Gorman
2010-03-23 14:35                   ` Christian Ehrhardt
2010-03-23 21:35                   ` Corrado Zoccolo
2010-03-24 11:48                     ` Mel Gorman
2010-03-24 12:56                       ` Corrado Zoccolo
2010-03-23 22:29                   ` Rik van Riel
2010-03-24 14:50                     ` Mel Gorman
2010-04-19 12:22                       ` Christian Ehrhardt
2010-04-19 21:44                         ` Johannes Weiner
2010-04-20  7:20                           ` Christian Ehrhardt
2010-04-20  8:54                             ` Christian Ehrhardt
2010-04-20 15:32                             ` Johannes Weiner
2010-04-20 17:22                               ` Rik van Riel
2010-04-21  4:23                                 ` Christian Ehrhardt
2010-04-21  7:35                                   ` Christian Ehrhardt
2010-04-21 13:19                                     ` Rik van Riel
2010-04-22  6:21                                       ` Christian Ehrhardt
2010-04-26 10:59                                         ` Subject: [PATCH][RFC] mm: make working set portion that is protected tunable v2 Christian Ehrhardt
2010-04-26 11:59                                           ` KOSAKI Motohiro
2010-04-26 12:43                                             ` Christian Ehrhardt
2010-04-26 14:20                                               ` Rik van Riel
2010-04-27 14:00                                                 ` Christian Ehrhardt
2010-04-21  9:03                                   ` [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure Johannes Weiner
2010-04-21 13:20                                   ` Rik van Riel
2010-04-20 14:40                           ` Rik van Riel
2010-03-24  2:38                   ` Greg KH
2010-03-24 11:49                     ` Mel Gorman
2010-03-24 13:13                   ` Johannes Weiner
2010-03-12  9:09   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100315122948.GJ18274@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=ehrhardt@linux.vnet.ibm.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox