From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: Nick Piggin <npiggin@suse.de>
Cc: linux-mm@kvack.org
Subject: Re: [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages
Date: Mon, 15 Feb 2010 12:09:49 -0500 [thread overview]
Message-ID: <1266253789.2911.107.camel@localhost> (raw)
In-Reply-To: <20100215055533.GF5723@laptop>
On Mon, 2010-02-15 at 16:55 +1100, Nick Piggin wrote:
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index c06739b..6a0aec7 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -503,6 +503,7 @@ static void balance_dirty_pages(struct address_space *mapping,
> > .nr_to_write = write_chunk,
> > .range_cyclic = 1,
> > };
> > + long bdi_nr_unstable = 0;
> >
> > get_dirty_limits(&background_thresh, &dirty_thresh,
> > &bdi_thresh, bdi);
> > @@ -512,8 +513,10 @@ static void balance_dirty_pages(struct address_space *mapping,
> > nr_writeback = global_page_state(NR_WRITEBACK);
> >
> > bdi_nr_reclaimable = bdi_stat(bdi, BDI_DIRTY);
> > - if (bdi_cap_account_unstable(bdi))
> > - bdi_nr_reclaimable += bdi_stat(bdi, BDI_UNSTABLE);
> > + if (bdi_cap_account_unstable(bdi)) {
> > + bdi_nr_unstable = bdi_stat(bdi, BDI_UNSTABLE);
> > + bdi_nr_reclaimable += bdi_nr_unstable;
> > + }
> > bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
> >
> > if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
> > @@ -541,6 +544,11 @@ static void balance_dirty_pages(struct address_space *mapping,
> > * up.
> > */
> > if (bdi_nr_reclaimable > bdi_thresh) {
> > + wbc.force_commit_unstable = 0;
> > + /* Force NFS to also free up unstable writes. */
> > + if (bdi_nr_unstable > bdi_nr_reclaimable / 2)
> > + wbc.force_commit_unstable = 1;
>
> This seems like it is putting NFS specific logic into the VM. OK,
> we already have it because we have these unstable pages, but all
> we really cared about before is that dirty+unstable ~= reclaimable.
>
> Shouldn't NFS just work out its ratio of dirty and unstable pages
> and just do the right thing in its writeback path?
>
Part of the problem is that balance_dirty_pages is looking at per-bdi
statistics, whereas the NFS layer is being called back on an
inode-by-inode basis.
Doing a per-bdi calculation every time we get called back in write_inode
is possible, but can be very inefficient for workloads that involve
writing to several files in parallel. In contrast, all we're really
adding to the VM layer here is a single extra comparison.
The issue here is that the VM wants to do non-blocking I/O using
WB_SYNC_NONE to write out the data. While that is well defined as far as
writeback of dirty pages is concerned, it is difficult to figure a
strategy for handling repeated calls to write_inode().
If we send a 'commit' rpc call every time the balance_dirty_pages loop,
or the bdi_writeback thread triggers a call to write_inode(), then we
end up causing the server to sync its pagecache to disk when we've only
managed to send it a few dirty pages.
We've tried adding a heuristic in the NFS layer that says it should
issue a commit when it sees that there are no more writes in flight for
that inode. However when we do so, we see that balance_dirty_pages ends
up spinning while the last few writes are being sent off.
The point of the extra 'force_commit_unstable' knob is that it allows
the caller to tell the NFS layer that we've written out enough dirty
pages, and that as far as the VM is concerned we can best make progress
by attacking the pileup of unstable writes. As I said above, that can be
done using a single extra comparison in the balance_dirty_pages, instead
of redoing the entire calculation for each inode called by
writeback_inodes_wbc(). Furthermore, it means that the bdi_writeback
thread can continue to do opportunistic writebacks without triggering a
lot of unnecessary flushes on the server.
Cheers
Trond
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-02-15 17:10 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-10 17:03 [PATCH 00/13] Allow the VM to manage NFS unstable writes Trond Myklebust
2010-02-10 17:03 ` [PATCH 01/13] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
2010-02-10 17:03 ` [PATCH 02/13] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-02-10 17:03 ` [PATCH 03/13] NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c Trond Myklebust
2010-02-10 17:03 ` [PATCH 04/13] NFS: Reduce the number of unnecessary COMMIT calls Trond Myklebust
2010-02-10 17:03 ` [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-02-10 17:03 ` [PATCH 06/13] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
2010-02-10 17:03 ` [PATCH 07/13] NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages Trond Myklebust
2010-02-10 17:03 ` [PATCH 08/13] NFS: Simplify nfs_wb_page_cancel() Trond Myklebust
2010-02-10 17:03 ` [PATCH 09/13] NFS: Replace __nfs_write_mapping with sync_inode() Trond Myklebust
2010-02-10 17:03 ` [PATCH 10/13] NFS: Simplify nfs_wb_page() Trond Myklebust
2010-02-10 17:03 ` [PATCH 11/13] NFS: Clean up nfs_sync_mapping Trond Myklebust
2010-02-10 17:03 ` [PATCH 12/13] NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping Trond Myklebust
2010-02-10 17:03 ` [PATCH 13/13] NFS: Don't write out dirty pages in nfs_release_page() Trond Myklebust
2010-02-15 5:55 ` [PATCH 05/13] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Nick Piggin
2010-02-15 17:09 ` Trond Myklebust [this message]
2010-02-15 17:51 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1266253789.2911.107.camel@localhost \
--to=trond.myklebust@netapp.com \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox