linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Anton Altaparmakov <aia21@cam.ac.uk>
To: Nick Piggin <npiggin@suse.de>
Cc: Chris Mason <chris.mason@oracle.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Trond Myklebust <trond.myklebust@fys.uio.no>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH RFC] extent mapped page cache
Date: Thu, 26 Jul 2007 08:53:21 +0100	[thread overview]
Message-ID: <377C3238-BFBE-49B4-98E7-EB866A4EBAC6@cam.ac.uk> (raw)
In-Reply-To: <20070726023639.GD20727@wotan.suse.de>

On 26 Jul 2007, at 03:36, Nick Piggin wrote:

> On Wed, Jul 25, 2007 at 10:10:07PM -0400, Chris Mason wrote:
>> On Thu, 26 Jul 2007 03:37:28 +0200
>> Nick Piggin <npiggin@suse.de> wrote:
>>
>>>
>>>> One advantage to the state tree is that it separates the state from
>>>> the memory being described, allowing a simple kmap style interface
>>>> that covers subpages, highmem and superpages.
>>>
>>> I suppose so, although we should have added those interfaces long
>>> ago ;) The variants in fsblock are pretty good, and you could always
>>> do an arbitrary extent (rather than block) based API using the
>>> pagecache tree if it would be helpful.
>>
>> Yes, you could use fsblock for the state bits and make a separate API
>> to map the actual pages.
>>
>>>
>>>
>>>> It also more naturally matches the way we want to do IO, making for
>>>> easy clustering.
>>>
>>> Well the pagecache tree is used to reasonable effect for that now.
>>> OK the code isn't beautiful ;). Granted, this might be an area where
>>> the seperate state tree ends up being better. We'll see.
>>>
>>
>> One thing it gains us is finding the start of the cluster.  Even if
>> called by kswapd, the state tree allows writepage to find the  
>> start of
>> the cluster and send down a big bio (provided I implement trylock to
>> avoid various deadlocks).
>
> That's very true, we could potentially also do that with the block  
> extent
> tree that I want to try with fsblock.
>
> I'm looking at "cleaning up" some of these aops APIs so hopefully  
> most of
> the deadlock problems go away. Should be useful to both our  
> efforts. Will
> post patches hopefully when I get time to finish the draft this  
> weekend.
>
>
>>>> O_DIRECT becomes a special case of readpages and writepages....the
>>>> memory used for IO just comes from userland instead of the page
>>>> cache.
>>>
>>> Could be, although you'll probably also need to teach the mm about
>>> the state tree and/or still manipulate the pagecache tree to prevent
>>> concurrency?
>>
>> Well, it isn't coded yet, but I should be able to do it from the FS
>> specific ops.
>
> Probably, if you invalidate all the pagecache in the range beforehand
> you should be able to do it (and I guess you want to do the invalidate
> anyway). Although, below deadlock issues might still bite somehwere...
>
>
>>> But isn't the main aim of O_DIRECT to do as little locking and
>>> synchronisation with the pagecache as possible? I thought this is
>>> why your race fixing patches got put on the back burner (although
>>> they did look fairly nice from a correctness POV).
>>
>> I put the placeholder patches on hold because handling a corner case
>> where userland did O_DIRECT from a mmap'd region of the same file  
>> (Linus
>> pointed it out to me).  Basically my patches had to work in 64k  
>> chunks
>> to avoid a deadlock in get_user_pages.  With the state tree, I can
>> allow the page to be faulted in but still properly deal with it.
>
> Oh right, I didn't think of that one. Would you still have similar
> issues with the external state tree? I mean, the filesystem doesn't
> really know why the fault is taken. O_DIRECT read from a file into
> mmapped memory of the same block in the file is almost hopeless I
> think.
>
>
>>> Well I'm kind of handwaving when it comes to O_DIRECT ;) It does  
>>> look
>>> like this might be another advantage of the state tree (although you
>>> aren't allowed to slow down buffered IO to achieve the locking ;)).
>>
>> ;) The O_DIRECT benefit is a fringe thing.  I've long wanted to help
>> clean up that code, but the real point of the patch is to make  
>> general
>> usage faster and less complex.  If I can't get there, the O_DIRECT
>> stuff doesn't matter.
>
> Sure, although unifying code is always a plus so I like that you've
> got that in mind.
>
>
>>>> The ability to put in additional tracking info like the process  
>>>> that
>>>> first dirtied a range is also significant.  So, I think it is worth
>>>> trying.
>>>
>>> Definitely, and I'm glad you are. You haven't converted me yet, but
>>> I look forward to finding the best ideas from our two approaches  
>>> when
>>> the patches are further along (ext2 port of fsblock coming along, so
>>> we'll be able to have races soon :P).
>>
>> I'm sure we can find some river in Cambridge, winner gets to throw
>> Axboe in.
>
> Very noble of you to donate your colleage to such a worthy cause.

Cambridge = Cam + Bridge = Bridge over the river Cam

The Cam is a bit muddy though so it is not a very enjoyable  
experience falling/being thrown into it.

Nice for punting on though when the weather is nice.  (-:

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-07-26  7:53 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-10 21:03 Chris Mason
2007-07-12  7:00 ` Daniel Phillips
2007-07-18 14:18   ` Chris Mason
2007-07-24 20:00 ` Chris Mason
2007-07-24 20:03   ` [PATCH RFC] extent mapped page cache main code Chris Mason
2007-07-24 20:04   ` [PATCH RFC] ext2 extentmap support Chris Mason
2007-07-24 20:13   ` [PATCH RFC] extent mapped page cache Trond Myklebust
2007-07-24 21:25     ` Peter Zijlstra
2007-07-24 23:25       ` Chris Mason
2007-07-25  2:32         ` Nick Piggin
2007-07-25 12:18           ` Chris Mason
2007-07-26  1:37             ` Nick Piggin
2007-07-26  2:10               ` Chris Mason
2007-07-26  2:36                 ` Nick Piggin
2007-07-26  7:53                   ` Anton Altaparmakov [this message]
2007-07-26 13:05                   ` Chris Mason
2007-07-27  1:15                     ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=377C3238-BFBE-49B4-98E7-EB866A4EBAC6@cam.ac.uk \
    --to=aia21@cam.ac.uk \
    --cc=a.p.zijlstra@chello.nl \
    --cc=chris.mason@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox