From: "Stephen C. Tweedie" <sct@redhat.com>
To: Andrea Arcangeli <andrea@suse.de>
Cc: "Stephen C. Tweedie" <sct@redhat.com>,
Chris Mason <clmsys@osfmail.isc.rit.edu>,
reiserfs@devlinux.com, linux-fsdevel@vger.rutgers.edu,
linux-mm@kvack.org, Ingo Molnar <mingo@redhat.com>,
Linus Torvalds <torvalds@transmeta.com>
Subject: Re: (reiserfs) Re: RFC: Re: journal ports for 2.3?
Date: Wed, 22 Dec 1999 00:28:55 +0000 (GMT) [thread overview]
Message-ID: <14432.6983.669104.707472@dukat.scot.redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.21.9912211434320.26889-100000@Fibonacci.suse.de>
Hi,
On Tue, 21 Dec 1999 14:57:29 +0100 (CET), Andrea Arcangeli
<andrea@suse.de> said:
> So you are talking about replacing this line:
> dirty = size_buffers_type[BUF_DIRTY] >> PAGE_SHIFT;
> with:
> dirty = (size_buffers_type[BUF_DIRTY]+size_buffers_type[BUF_PINNED]) >> PAGE_SHIFT;
Basically yes, but I was envisaging something slightly different from
the above.
There may well be data which is simply not in the buffer cache at all
but which needs to be accounted for as pinned memory. A good example
would be if some filesystem wants to implement deferred allocation of
disk blocks: the corresponding pages in the page cache obviously cannot
be flushed to disk without generating extra filesystem activity for the
allocation of disk blocks to pages. The pages must therefore be pinned,
but as they don't yet have disk mappings we can't assume that they are
in the buffer cache.
So we really need a pinned page threshold which can apply to general
pages, not necessarily to the buffer cache.
There's another issue, though. BUF_DIRTY buffers do not necessarily
count as pinned in this context: they can always be flushed to disk
without generating any significant new memory allocation pressure. We
still need to do write-throttling, so we need a threshold on dirty data
for that reason. However, deferred allocation and transactions actually
have a more subtle and nastier property: you cannot necessarily flush
the pages from memory without first allocating more memory.
In the transaction case this is because you have to allow transactions
which are already in progress to complete before you can commit the
transaction (you cannot commit incomplete transactions because that
would defeat the entire point of a transactional system!). In the case
of deferred disk block allocation, the problem is that flushing the
dirty data requires extra filesystem operations as we allocate disk
blocks to pages.
In these cases we need to be able to make sure that not only does pinned
memory never exceed a threshold, we also have to ensure that the
*future* allocations required to flush the existing allocated memory can
also be satisfied. We need to allow filesystems to "reserve" such extra
memory, and we need a system-wide threshold on all such reservations.
The ext3 journaling code already has support for reservations, but
that's currently a per-filesystem parameter. We still have need for a
global VM reservation to prevent memory starvation if multiple different
filesystems have this behaviour.
Note that what we need here isn't complex: it's no more than exporting
atomic_t counts of the number of dirty and reserved pages in the system
and supporting a maximum threshold on these values via /proc. The
mechanism for observing these limits can be local to each filesystem: as
long as there is an agreed counter in the VM where they can register
their use of memory.
--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.nl.linux.org/Linux-MM/
next prev parent reply other threads:[~1999-12-22 0:28 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <000c01bf472c$8ad8cb60$8edb1581@isc.rit.edu>
1999-12-21 0:24 ` Stephen C. Tweedie
1999-12-21 10:18 ` Andrea Arcangeli
1999-12-21 13:21 ` (reiserfs) " Stephen C. Tweedie
1999-12-21 13:57 ` Andrea Arcangeli
1999-12-22 0:28 ` Stephen C. Tweedie [this message]
1999-12-23 11:51 ` Hans Reiser
1999-12-22 23:37 ` Hans Reiser
2000-01-06 17:48 ` Stephen C. Tweedie
2000-01-06 18:20 ` Andrea Arcangeli
2000-01-06 21:32 ` Hans Reiser
2000-01-07 11:51 ` Stephen C. Tweedie
2000-01-07 12:46 ` Andrea Arcangeli
2000-01-07 19:59 ` Hans Reiser
1999-12-22 1:21 ` Benjamin C.R. LaHaise
1999-12-22 22:19 ` Stephen C. Tweedie
1999-12-22 22:41 ` (reiserfs) " Tan Pong Heng
1999-12-23 3:27 ` William J. Earl
1999-12-23 15:36 ` Andrea Arcangeli
1999-12-24 5:53 ` afei
1999-12-26 8:26 ` feiliu
2000-01-02 22:24 ` Peter J. Braam
2000-01-05 13:02 ` (reiserfs) Re: RFC: Re: journal ports for 2.3? (resending because my ISP probably lost it) Hans Reiser
2000-01-05 15:22 ` Peter J. Braam
2000-01-05 15:37 ` Tigran Aivazian
2000-01-06 8:40 ` Hans Reiser
2000-01-05 15:50 ` Chris Mason
2000-01-06 8:34 ` (reiserfs) Re: RFC: Re: journal ports for 2.3? (resendingbecause " Hans Reiser
2000-01-07 1:25 ` (reiserfs) Re: RFC: Re: journal ports for 2.3? (resendingbecause my Albert D. Cahalan
2000-01-07 11:37 ` Stephen C. Tweedie
2000-01-06 17:54 ` (reiserfs) Re: RFC: Re: journal ports for 2.3? Stephen C. Tweedie
1999-12-23 12:02 ` Hans Reiser
1999-12-23 15:49 ` Andrea Arcangeli
1999-12-23 16:41 ` Hans Reiser
1999-12-27 16:31 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=14432.6983.669104.707472@dukat.scot.redhat.com \
--to=sct@redhat.com \
--cc=andrea@suse.de \
--cc=clmsys@osfmail.isc.rit.edu \
--cc=linux-fsdevel@vger.rutgers.edu \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=reiserfs@devlinux.com \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox