From: Miklos Szeredi <miklos@szeredi.hu>
To: torvalds@linux-foundation.org
Cc: miklos@szeredi.hu, a.p.zijlstra@chello.nl, salikhmetov@gmail.com,
linux-mm@kvack.org, jakob@unthought.net,
linux-kernel@vger.kernel.org, valdis.kletnieks@vt.edu,
riel@redhat.com, ksm@42.dk, staubach@redhat.com,
jesper.juhl@gmail.com, akpm@linux-foundation.org,
protasnb@gmail.com, r.e.wolff@bitwizard.nl,
hidave.darkstar@gmail.com, hch@infradead.org
Subject: Re: [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys_msync()
Date: Thu, 24 Jan 2008 01:05:04 +0100 [thread overview]
Message-ID: <E1JHpaa-0004a9-8B@pomaz-ex.szeredi.hu> (raw)
In-Reply-To: <alpine.LFD.1.00.0801231329120.2803@woody.linux-foundation.org> (message from Linus Torvalds on Wed, 23 Jan 2008 13:36:45 -0800 (PST))
> > How about doing it in a separate pass, similarly to
> > wait_on_page_writeback()? Just instead of waiting, clean the page
> > tables for writeback pages.
>
> That sounds like a good idea, but it doesn't work.
>
> The thing is, we need to hold the page-table lock over the whole sequence
> of
>
> if (page_mkclean(page))
> set_page_dirty(page);
> if (TestClearPageDirty(page))
> ..
>
> and there's a big comment about why in clear_page_dirty_for_io().
>
> So if you split it up, so that the first phase is that
>
> if (page_mkclean(page))
> set_page_dirty(page);
>
> and the second phase is the one that just does a
>
> if (TestClearPageDirty(page))
> writeback(..)
>
> and having dropped the page lock in between, then you lose: because
> another thread migth have faulted in and re-dirtied the page table entry,
> and you MUST NOT do that "TestClearPageDirty()" in that case!
>
> That dirty bit handling is really really important, and it's sadly also
> really really easy to get wrong (usually in ways that are hard to even
> notice: things still work 99% of the time, and you might just be leaking
> memory slowly, and fsync/msync() might not write back memory mapped data
> to disk at all etc).
OK.
But I still think this approach should work. Untested, might eat your
children, so please don't apply to any kernel.
Miklos
Index: linux/mm/msync.c
===================================================================
--- linux.orig/mm/msync.c 2008-01-24 00:18:31.000000000 +0100
+++ linux/mm/msync.c 2008-01-24 00:50:37.000000000 +0100
@@ -10,9 +10,91 @@
#include <linux/fs.h>
#include <linux/mm.h>
#include <linux/mman.h>
+#include <linux/pagemap.h>
#include <linux/file.h>
#include <linux/syscalls.h>
#include <linux/sched.h>
+#include <linux/pagevec.h>
+#include <linux/rmap.h>
+#include <linux/backing-dev.h>
+
+static void mkclean_writeback_pages(struct address_space *mapping,
+ pgoff_t start, pgoff_t end)
+{
+ struct pagevec pvec;
+ pgoff_t index;
+
+ if (!mapping_cap_account_dirty(mapping))
+ return;
+
+ if (end < start)
+ return;
+
+ pagevec_init(&pvec, 0);
+ index = start;
+ while (index <= end) {
+ unsigned i;
+ int nr_pages = min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1;
+
+ nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
+ PAGECACHE_TAG_WRITEBACK,
+ nr_pages);
+ if (!nr_pages)
+ break;
+
+ for (i = 0; i < nr_pages; i++) {
+ struct page *page = pvec.pages[i];
+
+ /* until radix tree lookup accepts end_index */
+ if (page->index > end)
+ continue;
+
+ lock_page(page);
+ if (page_mkclean(page))
+ set_page_dirty(page);
+ unlock_page(page);
+ }
+ pagevec_release(&pvec);
+ cond_resched();
+ }
+}
+
+static int msync_range(struct file *file, loff_t start, unsigned long len, unsigned int sync)
+{
+ int ret;
+ struct address_space *mapping = file->f_mapping;
+ loff_t end = start + len - 1;
+ int sync_flags = SYNC_FILE_RANGE_WRITE;
+
+ if (sync) {
+ sync_flags |= SYNC_FILE_RANGE_WAIT_BEFORE;
+ } else {
+ /*
+ * For MS_ASYNC, don't wait for writback already in
+ * progress, but instead just clean the page tables.
+ */
+ mkclean_writeback_pages(mapping,
+ start >> PAGE_CACHE_SHIFT,
+ end >> PAGE_CACHE_SHIFT);
+ }
+
+ ret = do_sync_mapping_range(mapping, start, end, sync_flags);
+ if (ret || !sync)
+ return ret;
+
+ if (file->f_op && file->f_op->fsync) {
+ mutex_lock(&mapping->host->i_mutex);
+ ret = file->f_op->fsync(file, file->f_path.dentry, 0);
+ mutex_unlock(&mapping->host->i_mutex);
+
+ if (ret < 0)
+ return ret;
+ }
+
+ return wait_on_page_writeback_range(mapping,
+ start >> PAGE_CACHE_SHIFT,
+ end >> PAGE_CACHE_SHIFT);
+}
/*
* MS_SYNC syncs the entire file - including mappings.
@@ -77,18 +159,36 @@ asmlinkage long sys_msync(unsigned long
goto out_unlock;
}
file = vma->vm_file;
- start = vma->vm_end;
- if ((flags & MS_SYNC) && file &&
- (vma->vm_flags & VM_SHARED)) {
+
+ if (file && (vma->vm_flags & VM_SHARED) &&
+ (flags & (MS_SYNC | MS_ASYNC))) {
+ loff_t offset;
+ unsigned long len;
+
+ /*
+ * We need to do all of this before we release the mmap_sem,
+ * since "vma" isn't available after that.
+ */
+ offset = start - vma->vm_start;
+ offset += vma->vm_pgoff << PAGE_SHIFT;
+ len = end;
+ if (len > vma->vm_end)
+ len = vma->vm_end;
+ len -= start;
+
+ /* Update start here, since vm_end will be gone too.. */
+ start = vma->vm_end;
get_file(file);
up_read(&mm->mmap_sem);
- error = do_fsync(file, 0);
+
+ error = msync_range(file, offset, len, flags & MS_SYNC);
fput(file);
if (error || start >= end)
goto out;
down_read(&mm->mmap_sem);
vma = find_vma(mm, start);
} else {
+ start = vma->vm_end;
if (start >= end) {
error = 0;
goto out_unlock;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-01-24 0:05 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-22 23:21 [PATCH -v8 0/4] Fixing the issue with memory-mapped file times Anton Salikhmetov
2008-01-22 23:21 ` [PATCH -v8 1/4] Massive code cleanup of sys_msync() Anton Salikhmetov
2008-01-22 23:21 ` [PATCH -v8 2/4] Update ctime and mtime for memory-mapped files Anton Salikhmetov
2008-01-23 18:03 ` Linus Torvalds
2008-01-23 23:14 ` Anton Salikhmetov
2008-01-22 23:21 ` [PATCH -v8 3/4] Enable the MS_ASYNC functionality in sys_msync() Anton Salikhmetov
2008-01-23 8:47 ` Peter Zijlstra
2008-01-23 8:51 ` Peter Zijlstra
2008-01-23 9:34 ` Miklos Szeredi
2008-01-23 9:51 ` Miklos Szeredi
2008-01-23 13:09 ` Anton Salikhmetov
2008-01-23 12:53 ` Anton Salikhmetov
2008-01-23 9:41 ` Miklos Szeredi
2008-01-23 17:05 ` Linus Torvalds
2008-01-23 17:26 ` Anton Salikhmetov
2008-01-23 17:41 ` Peter Zijlstra
2008-01-23 19:35 ` Linus Torvalds
2008-01-23 19:55 ` Miklos Szeredi
2008-01-23 21:00 ` Linus Torvalds
2008-01-23 21:16 ` Miklos Szeredi
2008-01-23 21:36 ` Linus Torvalds
2008-01-23 22:29 ` Hugh Dickins
2008-01-23 22:41 ` Linus Torvalds
2008-01-24 0:03 ` Hugh Dickins
2008-01-24 0:05 ` Miklos Szeredi [this message]
2008-01-24 0:11 ` Linus Torvalds
2008-01-24 1:36 ` Nick Piggin
2008-01-24 18:56 ` Matt Mackall
2008-01-22 23:21 ` [PATCH -v8 4/4] The design document for memory-mapped file times update Anton Salikhmetov
2008-01-23 9:26 ` Miklos Szeredi
2008-01-23 10:37 ` Anton Salikhmetov
2008-01-23 10:53 ` Miklos Szeredi
2008-01-23 11:16 ` Miklos Szeredi
2008-01-23 12:25 ` Anton Salikhmetov
2008-01-23 13:55 ` Miklos Szeredi
2008-01-25 16:27 ` Randy Dunlap
2008-01-25 16:40 ` Anton Salikhmetov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1JHpaa-0004a9-8B@pomaz-ex.szeredi.hu \
--to=miklos@szeredi.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=hch@infradead.org \
--cc=hidave.darkstar@gmail.com \
--cc=jakob@unthought.net \
--cc=jesper.juhl@gmail.com \
--cc=ksm@42.dk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=protasnb@gmail.com \
--cc=r.e.wolff@bitwizard.nl \
--cc=riel@redhat.com \
--cc=salikhmetov@gmail.com \
--cc=staubach@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=valdis.kletnieks@vt.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox