From: "Stephen C. Tweedie" <sct@redhat.com>
To: Chris Mason <clmsys@osfmail.isc.rit.edu>
Cc: sct@redhat.com, reiserfs@devlinux.com,
linux-fsdevel@vger.rutgers.edu, linux-mm@kvack.org,
Andrea Arcangeli <andrea@suse.de>, Ingo Molnar <mingo@redhat.com>,
Linus Torvalds <torvalds@transmeta.com>
Subject: RFC: Re: journal ports for 2.3?
Date: Tue, 21 Dec 1999 00:24:09 +0000 (GMT) [thread overview]
Message-ID: <14430.51369.57387.224846@dukat.scot.redhat.com> (raw)
In-Reply-To: <000c01bf472c$8ad8cb60$8edb1581@isc.rit.edu>
Hi,
All comments welcome: this is a first draft outline of what I _think_
Linus is asking for from journaling for mainline kernels.
On Wed, 15 Dec 1999 13:45:22 -0500, Chris Mason
<clmsys@osfmail.isc.rit.edu> said:
> What is your current plan for porting ext3 into 2.3/2.4? Are you still
> going to be buffer cache based, or do you plan on moving every thing into
> the page cache?
For 2.4 the first release will probably still be in the buffer cache,
but I'm resigned to the fact that Linus won't accept it for a final
merge until it uses an alternative method.
I'd like to talk to you about that if possible. Right now, it looks
as if the following is the absolute minimum required to make ext3,
reiserfs and any unknown future journaled fs'es work properly in 2.3:
* Add an extra "async" parameter to super_operations->write_super()
to distinguish between bdflush and sync()
* Clean up the rules for allowing the raid5 code to snoop the buffer
cache: raid5 should consider a buffer locked and transient if it
has b_count raised
* The raid resync code needs to be atomic wrt. ll_rw_block()
* Whatever caching mechanism we use --- page cache or something else
--- we *must* allow the VM to make callbacks into the filesystem
to indicate memory pressure. There are two cases: first, when
memory gets short, we need to be able to request flush-from-memory
(including clean pages) secondly, if we detect too many dirty
buffers, we need to be able to request flush-to-disk (without
necessarily reclaiming memory, but causing a stall on the calling
process to act as a throttle on heavy write traffic).
For the out-of-memory pressure, ideally all we need is a callback on
the page->mapping address_space. We have one address space per
inode, so adding a struct as_operations to the address_space would
only grow our tables by one pointer per inode, not one pointer per
pages.
Shrink_mmap() can easily use such a pointer to perform any
filesystem-specific tearing-down of the page.
The second case is a little more tricky: currently the only
mechanism we have for write throttling under heavy write load is the
refile_buffer() checks in buffer.c. Ideally there should be a
system-wide upper bound on dirty data: if each different filesystem
starts to throttle writes at 50% of physical memory then you only
need two different filesystems to overcommit your memory badly.
A PG_Dirty flag, a global counter of dirty pages and a system-wide
dirty memory threshold would be enough to allow ext3 and reiserfs to
perform their own write throttling in a way which wouldn't fall
apart if both ext3 and reiserfs were rpesent in the system at the
same time. Making the refile_buffer() checks honour that global
threshold would be trivial.
The PG_Dirty flag would also allow for VM callbacks to be made to
the filesystems if it was determined that we needed the dirty memory
pages for some other use (as already happens in the buffer cache if
try_to_free_buffers fails and wakes up bdflush). Such a callback
should also be triggered off the address_space.
There are lots of other things which would be useful to journaling, such
as the ll_rw_block-level write ordering enforcement and write barrier,
but the above is really the minimum necessary to actually get the things
to _work_ without intruding into the buffer cache and without destroying
the system's performance if journaled transactions are allowed to grow
without VM back-pressure.
Cheers,
Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.nl.linux.org/Linux-MM/
next parent reply other threads:[~1999-12-21 0:24 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <000c01bf472c$8ad8cb60$8edb1581@isc.rit.edu>
1999-12-21 0:24 ` Stephen C. Tweedie [this message]
1999-12-21 10:18 ` Andrea Arcangeli
1999-12-21 13:21 ` (reiserfs) " Stephen C. Tweedie
1999-12-21 13:57 ` Andrea Arcangeli
1999-12-22 0:28 ` Stephen C. Tweedie
1999-12-23 11:51 ` Hans Reiser
1999-12-22 23:37 ` Hans Reiser
2000-01-06 17:48 ` Stephen C. Tweedie
2000-01-06 18:20 ` Andrea Arcangeli
2000-01-06 21:32 ` Hans Reiser
2000-01-07 11:51 ` Stephen C. Tweedie
2000-01-07 12:46 ` Andrea Arcangeli
2000-01-07 19:59 ` Hans Reiser
1999-12-22 1:21 ` Benjamin C.R. LaHaise
1999-12-22 22:19 ` Stephen C. Tweedie
1999-12-22 22:41 ` (reiserfs) " Tan Pong Heng
1999-12-23 3:27 ` William J. Earl
1999-12-23 15:36 ` Andrea Arcangeli
1999-12-24 5:53 ` afei
1999-12-26 8:26 ` feiliu
2000-01-02 22:24 ` Peter J. Braam
2000-01-05 13:02 ` (reiserfs) Re: RFC: Re: journal ports for 2.3? (resending because my ISP probably lost it) Hans Reiser
2000-01-05 15:22 ` Peter J. Braam
2000-01-05 15:37 ` Tigran Aivazian
2000-01-06 8:40 ` Hans Reiser
2000-01-05 15:50 ` Chris Mason
2000-01-06 8:34 ` (reiserfs) Re: RFC: Re: journal ports for 2.3? (resendingbecause " Hans Reiser
2000-01-07 1:25 ` (reiserfs) Re: RFC: Re: journal ports for 2.3? (resendingbecause my Albert D. Cahalan
2000-01-07 11:37 ` Stephen C. Tweedie
2000-01-06 17:54 ` (reiserfs) Re: RFC: Re: journal ports for 2.3? Stephen C. Tweedie
1999-12-23 12:02 ` Hans Reiser
1999-12-23 15:49 ` Andrea Arcangeli
1999-12-23 16:41 ` Hans Reiser
1999-12-27 16:31 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=14430.51369.57387.224846@dukat.scot.redhat.com \
--to=sct@redhat.com \
--cc=andrea@suse.de \
--cc=clmsys@osfmail.isc.rit.edu \
--cc=linux-fsdevel@vger.rutgers.edu \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=reiserfs@devlinux.com \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox