linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC] using writepage to start io
@ 2001-08-05 18:34 Chris Mason
  2001-08-05 22:38 ` Daniel Phillips
  2001-08-06 15:13 ` Eric W. Biederman
  0 siblings, 2 replies; 27+ messages in thread
From: Chris Mason @ 2001-08-05 18:34 UTC (permalink / raw)
  To: Daniel Phillips, linux-kernel; +Cc: linux-mm, torvalds


On Wednesday, August 01, 2001 04:57:35 PM +0200 Daniel Phillips
<phillips@bonn-fries.net> wrote:

> On Tuesday 31 July 2001 21:07, Chris Mason wrote:
>> This has been tested a little more now, both ext2 (1k, 4k) and
>> reiserfs.  dbench and iozone testing don't show any difference, but I
>> need to spend a little more time on the benchmarks.
> 
> It's impressive that such seemingly radical surgery on the vm innards 
> is a) possible and b) doesn't make the system perform noticably worse.

radical surgery is always possible ;-)  But, I was expecting better
performance results than I got.  I'm trying a few other things out here,
more details will come if they work.  

My real motivation for the patch is to allow better filesystem control of
how things get written though.  If I can do this without making things
slower, I've won.  The big drawback is how muddy writepage has gotten with
the patch, as I've more or less required checks for partial page writes.

> 
>> The idea is that using flush_dirty_buffers to start i/o under memory
>> pressure is less than optimal.  flush_dirty_buffers knows the oldest
>> dirty buffer, but has no page aging info, so it might not flush a
>> page that we actually want to free.
> 
> Note that the fact that buffers dirtied by ->writepage are ordered by 
> time-dirtied means that the dirty_buffers list really does have 
> indirect knowledge of page aging.  There may well be benefits to your 
> approach but I doubt this is one of them.

A problem is that under memory pressure, we'll flush a buffer that has been
dirty for a long time, even if we are constantly redirtying it and have it
more or less pinned.  This might not be common enough to cause problems,
but it still isn't optimal.  Yes, it is a good idea to flush that page at
some time, but under memory pressure we want to do the least amount of work
that will lead to a freeable page.

> 
> It's surprising that 1K buffer size isn't bothered by being grouped by 
> page in their IO requests.  I'd have thought that this would cause a 
> significant number of writes to be blocked waiting on the page lock 
> held by an unrelated buffer writeout.

Well, for non-buffer cache pages, we're getting a poor man's write
clustering.  If this doesn't slow down ext2 its because of good disk layout.

ext2 probably doesn't use the buffer cache enough to show bad results here.
reiserfs on ia64 or alpha might.

> 
> The most interesting part of your patch to me is the anon_space_mapping.
> It's nice to make buffer handling look more like page cache handling, 
> and get rid of some special cases in the vm scanning.  On the other 
> hand, buffers are different from pages in that, once buffers heads are 
> removed, nobody can find them any more, so they can not be rescued.
> Now, if I'm reading this correctly, buffer pages *will* progress on to 
> the inactive_clean list from the inactive_dirty list instead of jumping 
> that queue and being directly freed by the page_cache_release.

Without my patch, it looks to me like refill_inactive_scan will put buffer
cache pages on the inactive dirty list by calling deactivate_page_nolock.
page_launder catches these by checking page->buffers, and calling
try_to_free_buffers which starts the io.  

So, the big difference now is just that page_launder sees the page is
dirty, and uses writepage to start the io and try_to_free_buffers only
waits on it.  The rest should work more or less the same.

>  Maybe 
> this is good because it avoids the expensive-looking __free_pages_ok.
> 
> This looks scary:
> 
> +        index = atomic_read(&buffermem_pages) ;
> 
> Because buffermem_pages isn't unique.  This must mean you're never 
> doing page cache lookups for anon_space_mapping, because the 
> mapping+index key isn't unique.  There is a danger here of overloading 
> some hash buckets, which becomes a certainty if you use 0 or some other 
> constant for the index.  If you're never doing page cache lookups, why 
> even enter it into the page hash?

path of least surprise I suppose; I knew add_to_page_cache_locked() would
do what I wanted in terms of page setup, if there's a better way feel free
to advise ;-)  No page lookups are done on the buffer cache pages.

> That's all for now.  It's a very interesting patch.

thanks for the comments ;-)

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 27+ messages in thread
* Re: [RFC] using writepage to start io
@ 2001-08-07 15:19 Chris Mason
  0 siblings, 0 replies; 27+ messages in thread
From: Chris Mason @ 2001-08-07 15:19 UTC (permalink / raw)
  To: Daniel Phillips, linux-kernel; +Cc: linux-mm


On Monday, August 06, 2001 11:18:26 PM +0200 Daniel Phillips
<phillips@bonn-fries.net> wrote:

>> Grin, we're talking in circles.  My point is that by having two
>> threads, bdflush is allowed to skip over older buffers in favor of
>> younger ones because somebody else is responsible for writing the
>> older ones out.
> 
> Yes, and you can't imagine an algorithm that could do that with *one* 
> thread?

Imagine one?  Yes.  We're mixing a bunch of issues, so I'll list the 3
different cases again.  memory pressure, write throttling, age limiting.
Pretending that a single thread could get enough context information about
which of the 3 (perhaps more than one) it is currently facing, it can make
the right decisions.

The problem with that right now is that a single thread can't keep up (with
one case, let alone all 3) as the number of devices increases.  We can more
or less just replay the entire l-k discussion(s) on threading models here.

In my mind, in order for a single thread to get the job done, it can't end
up waiting on a device while there are still buffers ready for writeout to
idle devices.

As for a generic mechanism to schedule all FS writeback, I've been trying
to use writepage ;-)  The bad part here it makes the async issues even
bigger, since the flushing thread ends up calling into the FS (who knows
what that might lead to).

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 27+ messages in thread
[parent not found: <76740000.996336108@tiny>]

end of thread, other threads:[~2001-08-08 14:49 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-05 18:34 [RFC] using writepage to start io Chris Mason
2001-08-05 22:38 ` Daniel Phillips
2001-08-05 23:32   ` Chris Mason
2001-08-06  5:39     ` Daniel Phillips
2001-08-06 13:24       ` Chris Mason
2001-08-06 16:13         ` Daniel Phillips
2001-08-06 16:51           ` Chris Mason
2001-08-06 19:45             ` Daniel Phillips
2001-08-06 20:12               ` Chris Mason
2001-08-06 21:18                 ` Daniel Phillips
2001-08-07 11:02                   ` Stephen C. Tweedie
2001-08-07 11:39                     ` Ed Tomlinson
2001-08-07 12:07                       ` Anton Altaparmakov
2001-08-07 18:36                       ` Daniel Phillips
2001-08-07 12:02                     ` Anton Altaparmakov
2001-08-07 13:29                       ` Daniel Phillips
2001-08-07 13:31                         ` Alexander Viro
2001-08-07 15:52                           ` Daniel Phillips
2001-08-07 14:23                         ` Stephen C. Tweedie
2001-08-07 15:51                           ` Daniel Phillips
2001-08-08 14:49                             ` Stephen C. Tweedie
2001-08-06 15:13 ` Eric W. Biederman
  -- strict thread matches above, loose matches on Subject: below --
2001-08-07 15:19 Chris Mason
     [not found] <76740000.996336108@tiny>
2001-07-31 19:07 ` Chris Mason
2001-08-01  1:01   ` Daniel Phillips
2001-08-01  2:05     ` Chris Mason
2001-08-01 14:57   ` Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox