From: Dave Kleikamp <shaggy@austin.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Andrea Arcangeli <andrea@novell.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org
Subject: Re: [PATCH] zap_pte_range should not mark non-uptodate pages dirty
Date: Mon, 25 Oct 2004 08:58:23 -0500 [thread overview]
Message-ID: <1098712703.7260.2.camel@localhost> (raw)
In-Reply-To: <20041022162433.509341e4.akpm@osdl.org>
Andrew,
This patch does fix the oops I was seeing.
Shaggy
On Fri, 2004-10-22 at 18:24, Andrew Morton wrote:
> Something like this...
>
>
>
>
> - When invalidating pages, take care to shoot down any ptes which map them
> as well.
>
> This ensures that the next mmap access to the page will generate a major
> fault, so NFS's server-side modifications are picked up.
>
> This also allows us to call invalidate_complete_page() on all pages, so
> filesytems such as ext3 get a chance to invalidate the buffer_heads.
>
> - Don't mark in-pagetable pages as non-uptodate any more. That broke a
> previous guarantee that mapped-into-user-process pages are always uptodate.
>
> - Check the return value of invalidate_complete_page(). It can fail if
> someone redirties a page after generic_file_direct_IO() write it back.
>
> But we still have a problem. If invalidate_inode_pages2() calls
> unmap_mapping_range(), that can cause zap_pte_range() to dirty the pagecache
> pages. That will redirty the page's buffers and will cause
> invalidate_complete_page() to fail.
>
> So, in generic_file_direct_IO() we do a complete pte shootdown on the file
> up-front, prior to writing back dirty pagecache. This is only done for
> O_DIRECT writes. It _could_ be done for O_DIRECT reads too, providing full
> mmap-vs-direct-IO coherency for both O_DIRECT reads and O_DIRECT writes, but
> permitting the pte shootdown on O_DIRECT reads trivially allows people to nuke
> other people's mapped pagecache.
>
> NFS also uses invalidate_inode_pages2() for handling server-side modification
> notifications. But in the NFS case the clear_page_dirty() in
> invalidate_inode_pages2() is sufficient, because NFS doesn't have to worry
> about the "dirty buffers against a clean page" problem. (I think)
>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> ---
>
> 25-akpm/include/linux/fs.h | 2 -
> 25-akpm/mm/filemap.c | 19 ++++++++++---
> 25-akpm/mm/truncate.c | 63 +++++++++++++++++++++++++++------------------
> 3 files changed, 55 insertions(+), 29 deletions(-)
>
> diff -puN mm/truncate.c~invalidate_inode_pages-mmap-coherency-fix mm/truncate.c
> --- 25/mm/truncate.c~invalidate_inode_pages-mmap-coherency-fix Fri Oct 22 15:40:26 2004
> +++ 25-akpm/mm/truncate.c Fri Oct 22 16:05:26 2004
> @@ -65,6 +65,8 @@ truncate_complete_page(struct address_sp
> * be marked dirty at any time too. So we re-check the dirtiness inside
> * ->tree_lock. That provides exclusion against the __set_page_dirty
> * functions.
> + *
> + * Returns non-zero if the page was successfully invalidated.
> */
> static int
> invalidate_complete_page(struct address_space *mapping, struct page *page)
> @@ -281,50 +283,63 @@ unsigned long invalidate_inode_pages(str
> EXPORT_SYMBOL(invalidate_inode_pages);
>
> /**
> - * invalidate_inode_pages2 - remove all unmapped pages from an address_space
> + * invalidate_inode_pages2 - remove all pages from an address_space
> * @mapping - the address_space
> *
> - * invalidate_inode_pages2() is like truncate_inode_pages(), except for the case
> - * where the page is seen to be mapped into process pagetables. In that case,
> - * the page is marked clean but is left attached to its address_space.
> - *
> - * The page is also marked not uptodate so that a subsequent pagefault will
> - * perform I/O to bringthe page's contents back into sync with its backing
> - * store.
> + * Any pages which are found to be mapped into pagetables are unmapped prior to
> + * invalidation.
> *
> - * FIXME: invalidate_inode_pages2() is probably trivially livelockable.
> + * Returns -EIO if any pages could not be invalidated.
> */
> -void invalidate_inode_pages2(struct address_space *mapping)
> +int invalidate_inode_pages2(struct address_space *mapping)
> {
> struct pagevec pvec;
> pgoff_t next = 0;
> int i;
> + int ret = 0;
> + int did_full_unmap = 0;
>
> pagevec_init(&pvec, 0);
> - while (pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
> - for (i = 0; i < pagevec_count(&pvec); i++) {
> + while (!ret && pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
> + for (i = 0; !ret && i < pagevec_count(&pvec); i++) {
> struct page *page = pvec.pages[i];
>
> lock_page(page);
> - if (page->mapping == mapping) { /* truncate race? */
> - wait_on_page_writeback(page);
> - next = page->index + 1;
> - if (page_mapped(page)) {
> - clear_page_dirty(page);
> - ClearPageUptodate(page);
> + if (page->mapping != mapping) { /* truncate race? */
> + unlock_page(page);
> + continue;
> + }
> + wait_on_page_writeback(page);
> + next = page->index + 1;
> + while (page_mapped(page)) {
> + if (!did_full_unmap) {
> + /*
> + * Zap the rest of the file in one hit.
> + * FIXME: invalidate_inode_pages2()
> + * should take start/end offsets.
> + */
> + unmap_mapping_range(mapping,
> + page->index << PAGE_CACHE_SHIFT,
> + -1, 0);
> + did_full_unmap = 1;
> } else {
> - if (!invalidate_complete_page(mapping,
> - page)) {
> - clear_page_dirty(page);
> - ClearPageUptodate(page);
> - }
> + /*
> + * Just zap this page
> + */
> + unmap_mapping_range(mapping,
> + page->index << PAGE_CACHE_SHIFT,
> + (page->index << PAGE_CACHE_SHIFT)+1,
> + 0);
> }
> }
> + clear_page_dirty(page);
> + if (!invalidate_complete_page(mapping, page))
> + ret = -EIO;
> unlock_page(page);
> }
> pagevec_release(&pvec);
> cond_resched();
> }
> + return ret;
> }
> -
> EXPORT_SYMBOL_GPL(invalidate_inode_pages2);
> diff -puN include/linux/fs.h~invalidate_inode_pages-mmap-coherency-fix include/linux/fs.h
> --- 25/include/linux/fs.h~invalidate_inode_pages-mmap-coherency-fix Fri Oct 22 15:54:50 2004
> +++ 25-akpm/include/linux/fs.h Fri Oct 22 15:54:58 2004
> @@ -1404,7 +1404,7 @@ static inline void invalidate_remote_ino
> S_ISLNK(inode->i_mode))
> invalidate_inode_pages(inode->i_mapping);
> }
> -extern void invalidate_inode_pages2(struct address_space *mapping);
> +extern int invalidate_inode_pages2(struct address_space *mapping);
> extern void write_inode_now(struct inode *, int);
> extern int filemap_fdatawrite(struct address_space *);
> extern int filemap_flush(struct address_space *);
> diff -puN mm/filemap.c~invalidate_inode_pages-mmap-coherency-fix mm/filemap.c
> --- 25/mm/filemap.c~invalidate_inode_pages-mmap-coherency-fix Fri Oct 22 15:55:06 2004
> +++ 25-akpm/mm/filemap.c Fri Oct 22 16:17:19 2004
> @@ -2214,7 +2214,8 @@ ssize_t generic_file_writev(struct file
> EXPORT_SYMBOL(generic_file_writev);
>
> /*
> - * Called under i_sem for writes to S_ISREG files
> + * Called under i_sem for writes to S_ISREG files. Returns -EIO if something
> + * went wrong during pagecache shootdown.
> */
> ssize_t
> generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
> @@ -2224,14 +2225,24 @@ generic_file_direct_IO(int rw, struct ki
> struct address_space *mapping = file->f_mapping;
> ssize_t retval;
>
> + /*
> + * If it's a write, unmap all mmappings of the file up-front. This
> + * will cause any pte dirty bits to be propagated into the pageframes
> + * for the subsequent filemap_write_and_wait().
> + */
> + if (rw == WRITE && mapping_mapped(mapping))
> + unmap_mapping_range(mapping, 0, -1, 0);
> +
> retval = filemap_write_and_wait(mapping);
> if (retval == 0) {
> retval = mapping->a_ops->direct_IO(rw, iocb, iov,
> offset, nr_segs);
> - if (rw == WRITE && mapping->nrpages)
> - invalidate_inode_pages2(mapping);
> + if (rw == WRITE && mapping->nrpages) {
> + int err = invalidate_inode_pages2(mapping);
> + if (err)
> + retval = err;
> + }
> }
> return retval;
> }
> -
> EXPORT_SYMBOL_GPL(generic_file_direct_IO);
> _
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2004-10-25 13:58 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-10-21 21:15 Dave Kleikamp
2004-10-21 21:45 ` Andrew Morton
2004-10-21 22:36 ` Andrea Arcangeli
2004-10-21 23:02 ` Andrew Morton
2004-10-21 23:20 ` Andrea Arcangeli
2004-10-21 23:42 ` Andrew Morton
2004-10-22 0:15 ` Andrew Morton
2004-10-22 0:41 ` Andrea Arcangeli
2004-10-22 2:51 ` Rik van Riel
2004-10-22 16:19 ` Andrea Arcangeli
2004-10-22 0:30 ` Andrea Arcangeli
2004-10-22 1:22 ` Andrea Arcangeli
2004-10-22 2:03 ` Andrew Morton
2004-10-22 16:17 ` Andrea Arcangeli
2004-10-22 17:04 ` Andrea Arcangeli
2004-10-22 23:24 ` Andrew Morton
2004-10-25 13:58 ` Dave Kleikamp [this message]
2004-10-26 0:35 ` Andrea Arcangeli
2004-11-09 14:15 ` Dave Kleikamp
2004-11-09 14:46 ` Andrea Arcangeli
2004-11-09 19:51 ` Andrew Morton
2004-11-09 19:46 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1098712703.7260.2.camel@localhost \
--to=shaggy@austin.ibm.com \
--cc=akpm@osdl.org \
--cc=andrea@novell.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox