* [PATCH 0/2] yet another attempt to fix the ctime and mtime issue @ 2008-01-13 4:39 Anton Salikhmetov 2008-01-13 4:39 ` [PATCH 1/2] massive code cleanup of sys_msync() Anton Salikhmetov 2008-01-13 4:39 ` [PATCH 2/2] updating ctime and mtime at syncing Anton Salikhmetov 0 siblings, 2 replies; 20+ messages in thread From: Anton Salikhmetov @ 2008-01-13 4:39 UTC (permalink / raw) To: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb The POSIX standard requires that the ctime and mtime fields for memory-mapped files should be updated after a write reference to the memory region where the file data is mapped. At least FreeBSD 6.2 and HP-UX 11i implement this properly. Linux does not, which leads to data loss problems in database backup applications. Kernel Bug Tracker contains more information about the problem: http://bugzilla.kernel.org/show_bug.cgi?id=2645 There have been several attempts in the past to address this issue. Following are a few links to LKML discussions related to this bug: http://lkml.org/lkml/2006/5/17/138 http://lkml.org/lkml/2007/2/21/242 http://lkml.org/lkml/2008/1/7/234 All earlier solutions were criticized. Some solutions did not handle memory-mapped block devices properly. Some led to forcing applications to explicitly call msync() to update file metadata. Some contained errors in using kernel synchronization primitives. In the two patches that follow, I would like to propose a new solution. This is the third version of my changes. This version takes into account all feedback I received for the two previous versions. The overall design remains basically the same as the one that was acked by Rick van Riel: http://lkml.org/lkml/2008/1/11/208 To the best of my knowledge, these patches are free of all the drawbacks found during previous attempts by Peter Staubach, Miklos Szeredi and myself. New since the previous version: 1) no need to explicitly call msync() to update file times; 2) changing block device data is visible to all device files associated with the block device; 3) in the cleanup part, the error checks are separated out as suggested by Rik van Riel; 4) some small refinements accodring to the LKML comments. This is how I tested the patches. 1. To test the features mentioned above, I wrote a unit test available from http://bugzilla.kernel.org/attachment.cgi?id=14430 I verified that the unit test passed successfully for both regular files and block device files. For the unit test I used the following architectures: 32-bit x86, x86_64 and MIPS32 (cross-compiled from x86_64). 2. I did build tests with allmodconfig and allyesconfig on x86_64. 3. I ran the following test cases from the LTP test suite: msync01 msync02 msync03 msync04 msync05 mmapstress01 mmapstress09 mmapstress10 No regressions were found by these test cases. I think that the bug #2645 is resolved by these patches. Please apply. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/2] massive code cleanup of sys_msync() 2008-01-13 4:39 [PATCH 0/2] yet another attempt to fix the ctime and mtime issue Anton Salikhmetov @ 2008-01-13 4:39 ` Anton Salikhmetov 2008-01-13 4:46 ` Rik van Riel 2008-01-14 10:49 ` Miklos Szeredi 2008-01-13 4:39 ` [PATCH 2/2] updating ctime and mtime at syncing Anton Salikhmetov 1 sibling, 2 replies; 20+ messages in thread From: Anton Salikhmetov @ 2008-01-13 4:39 UTC (permalink / raw) To: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb Substantial code cleanup of the sys_msync() function: 1) using the PAGE_ALIGN() macro instead of "manual" alignment; 2) improved readability of the loop traversing the process memory regions. Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> --- mm/msync.c | 78 +++++++++++++++++++++++++++--------------------------------- 1 files changed, 35 insertions(+), 43 deletions(-) diff --git a/mm/msync.c b/mm/msync.c index 144a757..ff654c9 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -1,24 +1,25 @@ /* * linux/mm/msync.c * + * The msync() system call. * Copyright (C) 1994-1999 Linus Torvalds + * + * Substantial code cleanup. + * Copyright (C) 2008 Anton Salikhmetov <salikhmetov@gmail.com> */ -/* - * The msync() system call. - */ +#include <linux/file.h> #include <linux/fs.h> #include <linux/mm.h> #include <linux/mman.h> -#include <linux/file.h> -#include <linux/syscalls.h> #include <linux/sched.h> +#include <linux/syscalls.h> /* * MS_SYNC syncs the entire file - including mappings. * * MS_ASYNC does not start I/O (it used to, up to 2.5.67). - * Nor does it marks the relevant pages dirty (it used to up to 2.6.17). + * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). * Now it doesn't do anything, since dirty pages are properly tracked. * * The application may now run fsync() to @@ -33,71 +34,62 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) unsigned long end; struct mm_struct *mm = current->mm; struct vm_area_struct *vma; - int unmapped_error = 0; - int error = -EINVAL; + int error = 0, unmapped_error = 0; if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC)) - goto out; + return -EINVAL; if (start & ~PAGE_MASK) - goto out; + return -EINVAL; if ((flags & MS_ASYNC) && (flags & MS_SYNC)) - goto out; - error = -ENOMEM; - len = (len + ~PAGE_MASK) & PAGE_MASK; + return -EINVAL; + + len = PAGE_ALIGN(len); end = start + len; if (end < start) - goto out; - error = 0; + return -ENOMEM; if (end == start) - goto out; + return 0; + /* * If the interval [start,end) covers some unmapped address ranges, * just ignore them, but return -ENOMEM at the end. */ down_read(&mm->mmap_sem); vma = find_vma(mm, start); - for (;;) { + do { struct file *file; - /* Still start < end. */ - error = -ENOMEM; - if (!vma) - goto out_unlock; - /* Here start < vma->vm_end. */ + if (!vma) { + error = -ENOMEM; + break; + } if (start < vma->vm_start) { start = vma->vm_start; - if (start >= end) - goto out_unlock; + if (start >= end) { + error = -ENOMEM; + break; + } unmapped_error = -ENOMEM; } - /* Here vma->vm_start <= start < vma->vm_end. */ - if ((flags & MS_INVALIDATE) && - (vma->vm_flags & VM_LOCKED)) { + if ((flags & MS_INVALIDATE) && (vma->vm_flags & VM_LOCKED)) { error = -EBUSY; - goto out_unlock; + break; } file = vma->vm_file; - start = vma->vm_end; - if ((flags & MS_SYNC) && file && - (vma->vm_flags & VM_SHARED)) { + if ((flags & MS_SYNC) && file && (vma->vm_flags & VM_SHARED)) { get_file(file); up_read(&mm->mmap_sem); error = do_fsync(file, 0); fput(file); - if (error || start >= end) - goto out; + if (error) + return error; down_read(&mm->mmap_sem); - vma = find_vma(mm, start); - } else { - if (start >= end) { - error = 0; - goto out_unlock; - } - vma = vma->vm_next; } - } -out_unlock: + + start = vma->vm_end; + vma = vma->vm_next; + } while (start < end); up_read(&mm->mmap_sem); -out: + return error ? : unmapped_error; } -- 1.4.4.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] massive code cleanup of sys_msync() 2008-01-13 4:39 ` [PATCH 1/2] massive code cleanup of sys_msync() Anton Salikhmetov @ 2008-01-13 4:46 ` Rik van Riel 2008-01-14 10:49 ` Miklos Szeredi 1 sibling, 0 replies; 20+ messages in thread From: Rik van Riel @ 2008-01-13 4:46 UTC (permalink / raw) To: Anton Salikhmetov Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb On Sun, 13 Jan 2008 07:39:58 +0300 Anton Salikhmetov <salikhmetov@gmail.com> wrote: > Substantial code cleanup of the sys_msync() function: > > 1) using the PAGE_ALIGN() macro instead of "manual" alignment; > 2) improved readability of the loop traversing the process memory regions. > > Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> -- All rights reversed. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] massive code cleanup of sys_msync() 2008-01-13 4:39 ` [PATCH 1/2] massive code cleanup of sys_msync() Anton Salikhmetov 2008-01-13 4:46 ` Rik van Riel @ 2008-01-14 10:49 ` Miklos Szeredi 2008-01-14 11:56 ` Anton Salikhmetov 1 sibling, 1 reply; 20+ messages in thread From: Miklos Szeredi @ 2008-01-14 10:49 UTC (permalink / raw) To: salikhmetov Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb > Substantial code cleanup of the sys_msync() function: > > 1) using the PAGE_ALIGN() macro instead of "manual" alignment; > 2) improved readability of the loop traversing the process memory regions. Thanks for doing this. See comments below. > Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> > --- > mm/msync.c | 78 +++++++++++++++++++++++++++--------------------------------- > 1 files changed, 35 insertions(+), 43 deletions(-) > > diff --git a/mm/msync.c b/mm/msync.c > index 144a757..ff654c9 100644 > --- a/mm/msync.c > +++ b/mm/msync.c > @@ -1,24 +1,25 @@ > /* > * linux/mm/msync.c > * > + * The msync() system call. > * Copyright (C) 1994-1999 Linus Torvalds > + * > + * Substantial code cleanup. > + * Copyright (C) 2008 Anton Salikhmetov <salikhmetov@gmail.com> > */ > > -/* > - * The msync() system call. > - */ > +#include <linux/file.h> > #include <linux/fs.h> > #include <linux/mm.h> > #include <linux/mman.h> > -#include <linux/file.h> > -#include <linux/syscalls.h> > #include <linux/sched.h> > +#include <linux/syscalls.h> > > /* > * MS_SYNC syncs the entire file - including mappings. > * > * MS_ASYNC does not start I/O (it used to, up to 2.5.67). > - * Nor does it marks the relevant pages dirty (it used to up to 2.6.17). > + * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). > * Now it doesn't do anything, since dirty pages are properly tracked. > * > * The application may now run fsync() to > @@ -33,71 +34,62 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) > unsigned long end; > struct mm_struct *mm = current->mm; > struct vm_area_struct *vma; > - int unmapped_error = 0; > - int error = -EINVAL; > + int error = 0, unmapped_error = 0; > > if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC)) > - goto out; > + return -EINVAL; > if (start & ~PAGE_MASK) > - goto out; > + return -EINVAL; > if ((flags & MS_ASYNC) && (flags & MS_SYNC)) > - goto out; > - error = -ENOMEM; > - len = (len + ~PAGE_MASK) & PAGE_MASK; > + return -EINVAL; > + > + len = PAGE_ALIGN(len); > end = start + len; > if (end < start) > - goto out; > - error = 0; > + return -ENOMEM; > if (end == start) > - goto out; > + return 0; > + > /* > * If the interval [start,end) covers some unmapped address ranges, > * just ignore them, but return -ENOMEM at the end. > */ > down_read(&mm->mmap_sem); > vma = find_vma(mm, start); > - for (;;) { > + do { > struct file *file; > > - /* Still start < end. */ > - error = -ENOMEM; > - if (!vma) > - goto out_unlock; > - /* Here start < vma->vm_end. */ > + if (!vma) { > + error = -ENOMEM; > + break; > + } > if (start < vma->vm_start) { > start = vma->vm_start; > - if (start >= end) > - goto out_unlock; > + if (start >= end) { > + error = -ENOMEM; > + break; > + } > unmapped_error = -ENOMEM; > } > - /* Here vma->vm_start <= start < vma->vm_end. */ > - if ((flags & MS_INVALIDATE) && > - (vma->vm_flags & VM_LOCKED)) { > + if ((flags & MS_INVALIDATE) && (vma->vm_flags & VM_LOCKED)) { > error = -EBUSY; > - goto out_unlock; > + break; > } > file = vma->vm_file; > - start = vma->vm_end; > - if ((flags & MS_SYNC) && file && > - (vma->vm_flags & VM_SHARED)) { > + if ((flags & MS_SYNC) && file && (vma->vm_flags & VM_SHARED)) { > get_file(file); > up_read(&mm->mmap_sem); > error = do_fsync(file, 0); > fput(file); > - if (error || start >= end) > - goto out; > + if (error) > + return error; > down_read(&mm->mmap_sem); > - vma = find_vma(mm, start); Where did this line go? It's needed because after releasing and reacquiring the mmap sem, the old vma may have gone away. I suggest, that when doing such a massive cleanup, that you split it up even further into easily understandable pieces, so such bugs cannot creep in. > - } else { > - if (start >= end) { > - error = 0; > - goto out_unlock; > - } > - vma = vma->vm_next; > } > - } > -out_unlock: > + > + start = vma->vm_end; > + vma = vma->vm_next; > + } while (start < end); > up_read(&mm->mmap_sem); > -out: > + > return error ? : unmapped_error; > } > -- > 1.4.4.4 Miklos -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] massive code cleanup of sys_msync() 2008-01-14 10:49 ` Miklos Szeredi @ 2008-01-14 11:56 ` Anton Salikhmetov 0 siblings, 0 replies; 20+ messages in thread From: Anton Salikhmetov @ 2008-01-14 11:56 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb 2008/1/14, Miklos Szeredi <miklos@szeredi.hu>: > > Substantial code cleanup of the sys_msync() function: > > > > 1) using the PAGE_ALIGN() macro instead of "manual" alignment; > > 2) improved readability of the loop traversing the process memory regions. > > Thanks for doing this. See comments below. > > > Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> > > --- > > mm/msync.c | 78 +++++++++++++++++++++++++++--------------------------------- > > 1 files changed, 35 insertions(+), 43 deletions(-) > > > > diff --git a/mm/msync.c b/mm/msync.c > > index 144a757..ff654c9 100644 > > --- a/mm/msync.c > > +++ b/mm/msync.c > > @@ -1,24 +1,25 @@ > > /* > > * linux/mm/msync.c > > * > > + * The msync() system call. > > * Copyright (C) 1994-1999 Linus Torvalds > > + * > > + * Substantial code cleanup. > > + * Copyright (C) 2008 Anton Salikhmetov <salikhmetov@gmail.com> > > */ > > > > -/* > > - * The msync() system call. > > - */ > > +#include <linux/file.h> > > #include <linux/fs.h> > > #include <linux/mm.h> > > #include <linux/mman.h> > > -#include <linux/file.h> > > -#include <linux/syscalls.h> > > #include <linux/sched.h> > > +#include <linux/syscalls.h> > > > > /* > > * MS_SYNC syncs the entire file - including mappings. > > * > > * MS_ASYNC does not start I/O (it used to, up to 2.5.67). > > - * Nor does it marks the relevant pages dirty (it used to up to 2.6.17). > > + * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). > > * Now it doesn't do anything, since dirty pages are properly tracked. > > * > > * The application may now run fsync() to > > @@ -33,71 +34,62 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) > > unsigned long end; > > struct mm_struct *mm = current->mm; > > struct vm_area_struct *vma; > > - int unmapped_error = 0; > > - int error = -EINVAL; > > + int error = 0, unmapped_error = 0; > > > > if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC)) > > - goto out; > > + return -EINVAL; > > if (start & ~PAGE_MASK) > > - goto out; > > + return -EINVAL; > > if ((flags & MS_ASYNC) && (flags & MS_SYNC)) > > - goto out; > > - error = -ENOMEM; > > - len = (len + ~PAGE_MASK) & PAGE_MASK; > > + return -EINVAL; > > + > > + len = PAGE_ALIGN(len); > > end = start + len; > > if (end < start) > > - goto out; > > - error = 0; > > + return -ENOMEM; > > if (end == start) > > - goto out; > > + return 0; > > + > > /* > > * If the interval [start,end) covers some unmapped address ranges, > > * just ignore them, but return -ENOMEM at the end. > > */ > > down_read(&mm->mmap_sem); > > vma = find_vma(mm, start); > > - for (;;) { > > + do { > > struct file *file; > > > > - /* Still start < end. */ > > - error = -ENOMEM; > > - if (!vma) > > - goto out_unlock; > > - /* Here start < vma->vm_end. */ > > + if (!vma) { > > + error = -ENOMEM; > > + break; > > + } > > if (start < vma->vm_start) { > > start = vma->vm_start; > > - if (start >= end) > > - goto out_unlock; > > + if (start >= end) { > > + error = -ENOMEM; > > + break; > > + } > > unmapped_error = -ENOMEM; > > } > > - /* Here vma->vm_start <= start < vma->vm_end. */ > > - if ((flags & MS_INVALIDATE) && > > - (vma->vm_flags & VM_LOCKED)) { > > + if ((flags & MS_INVALIDATE) && (vma->vm_flags & VM_LOCKED)) { > > error = -EBUSY; > > - goto out_unlock; > > + break; > > } > > file = vma->vm_file; > > - start = vma->vm_end; > > - if ((flags & MS_SYNC) && file && > > - (vma->vm_flags & VM_SHARED)) { > > + if ((flags & MS_SYNC) && file && (vma->vm_flags & VM_SHARED)) { > > get_file(file); > > up_read(&mm->mmap_sem); > > error = do_fsync(file, 0); > > fput(file); > > - if (error || start >= end) > > - goto out; > > + if (error) > > + return error; > > down_read(&mm->mmap_sem); > > - vma = find_vma(mm, start); > > Where did this line go? It's needed because after releasing and > reacquiring the mmap sem, the old vma may have gone away. > > I suggest, that when doing such a massive cleanup, that you split it > up even further into easily understandable pieces, so such bugs cannot > creep in. Thanks for your review. I overlooked this problem. I'll redo my cleanup soon. > > > - } else { > > - if (start >= end) { > > - error = 0; > > - goto out_unlock; > > - } > > - vma = vma->vm_next; > > } > > - } > > -out_unlock: > > + > > + start = vma->vm_end; > > + vma = vma->vm_next; > > + } while (start < end); > > up_read(&mm->mmap_sem); > > -out: > > + > > return error ? : unmapped_error; > > } > > -- > > 1.4.4.4 > > Miklos > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/2] updating ctime and mtime at syncing 2008-01-13 4:39 [PATCH 0/2] yet another attempt to fix the ctime and mtime issue Anton Salikhmetov 2008-01-13 4:39 ` [PATCH 1/2] massive code cleanup of sys_msync() Anton Salikhmetov @ 2008-01-13 4:39 ` Anton Salikhmetov 2008-01-13 4:59 ` Rik van Riel 2008-01-14 11:08 ` Miklos Szeredi 1 sibling, 2 replies; 20+ messages in thread From: Anton Salikhmetov @ 2008-01-13 4:39 UTC (permalink / raw) To: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb Changes for updating the ctime and mtime fields for memory-mapped files: 1) new flag triggering update of the inode data; 2) new function to update ctime and mtime for block device files; 3) new helper function to update ctime and mtime when needed; 4) updating time stamps for mapped files in sys_msync() and do_fsync(); 5) implementing the feature of auto-updating ctime and mtime. Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> --- fs/buffer.c | 1 + fs/fs-writeback.c | 2 ++ fs/inode.c | 42 +++++++++++++++++++++++++++++++++++------- fs/sync.c | 2 ++ include/linux/fs.h | 9 ++++++++- include/linux/pagemap.h | 3 ++- mm/msync.c | 24 ++++++++++++++++-------- mm/page-writeback.c | 1 + 8 files changed, 67 insertions(+), 17 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 7249e01..09adf7e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -719,6 +719,7 @@ static int __set_page_dirty(struct page *page, } write_unlock_irq(&mapping->tree_lock); __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + set_bit(AS_MCTIME, &mapping->flags); return 1; } diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 0fca820..c25ebd5 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -243,6 +243,8 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc) spin_unlock(&inode_lock); + mapping_update_time(mapping); + ret = do_writepages(mapping, wbc); /* Don't write the inode if only I_DIRTY_PAGES was set */ diff --git a/fs/inode.c b/fs/inode.c index ed35383..c02bfab 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1243,8 +1243,8 @@ void touch_atime(struct vfsmount *mnt, struct dentry *dentry) EXPORT_SYMBOL(touch_atime); /** - * file_update_time - update mtime and ctime time - * @file: file accessed + * inode_update_time - update mtime and ctime time + * @inode: inode accessed * * Update the mtime and ctime members of an inode and mark the inode * for writeback. Note that this function is meant exclusively for @@ -1253,10 +1253,8 @@ EXPORT_SYMBOL(touch_atime); * S_NOCTIME inode flag, e.g. for network filesystem where these * timestamps are handled by the server. */ - -void file_update_time(struct file *file) +void inode_update_time(struct inode *inode) { - struct inode *inode = file->f_path.dentry->d_inode; struct timespec now; int sync_it = 0; @@ -1279,8 +1277,39 @@ void file_update_time(struct file *file) if (sync_it) mark_inode_dirty_sync(inode); } +EXPORT_SYMBOL(inode_update_time); + +/* + * Update the ctime and mtime stamps for memory-mapped block device files. + */ +static void bd_inode_update_time(struct inode *inode) +{ + struct block_device *bdev = inode->i_bdev; + struct list_head *p; + + if (bdev == NULL) + return; + + mutex_lock(&bdev->bd_mutex); + list_for_each(p, &bdev->bd_inodes) { + inode = list_entry(p, struct inode, i_devices); + inode_update_time(inode); + } + mutex_unlock(&bdev->bd_mutex); +} -EXPORT_SYMBOL(file_update_time); +/* + * Update the ctime and mtime stamps after checking if they are to be updated. + */ +void mapping_update_time(struct address_space *mapping) +{ + if (test_and_clear_bit(AS_MCTIME, &mapping->flags)) { + if (S_ISBLK(mapping->host->i_mode)) + bd_inode_update_time(mapping->host); + else + inode_update_time(mapping->host); + } +} int inode_needs_sync(struct inode *inode) { @@ -1290,7 +1319,6 @@ int inode_needs_sync(struct inode *inode) return 1; return 0; } - EXPORT_SYMBOL(inode_needs_sync); int inode_wait(void *word) diff --git a/fs/sync.c b/fs/sync.c index 7cd005e..5561464 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -87,6 +87,8 @@ long do_fsync(struct file *file, int datasync) goto out; } + mapping_update_time(mapping); + ret = filemap_fdatawrite(mapping); /* diff --git a/include/linux/fs.h b/include/linux/fs.h index b3ec4a4..1dccd4b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1977,7 +1977,14 @@ extern int buffer_migrate_page(struct address_space *, extern int inode_change_ok(struct inode *, struct iattr *); extern int __must_check inode_setattr(struct inode *, struct iattr *); -extern void file_update_time(struct file *file); +extern void inode_update_time(struct inode *); + +static inline void file_update_time(struct file *file) +{ + inode_update_time(file->f_path.dentry->d_inode); +} + +extern void mapping_update_time(struct address_space *); static inline ino_t parent_ino(struct dentry *dentry) { diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index db8a410..bf0f9e7 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -17,8 +17,9 @@ * Bits in mapping->flags. The lower __GFP_BITS_SHIFT bits are the page * allocation mode flags. */ -#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ +#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ #define AS_ENOSPC (__GFP_BITS_SHIFT + 1) /* ENOSPC on async write */ +#define AS_MCTIME (__GFP_BITS_SHIFT + 2) /* mtime and ctime to update */ static inline void mapping_set_error(struct address_space *mapping, int error) { diff --git a/mm/msync.c b/mm/msync.c index ff654c9..07dc8fc 100644 --- a/mm/msync.c +++ b/mm/msync.c @@ -5,6 +5,7 @@ * Copyright (C) 1994-1999 Linus Torvalds * * Substantial code cleanup. + * Updating the ctime and mtime stamps for memory-mapped files. * Copyright (C) 2008 Anton Salikhmetov <salikhmetov@gmail.com> */ @@ -22,6 +23,10 @@ * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). * Now it doesn't do anything, since dirty pages are properly tracked. * + * The msync() system call updates the ctime and mtime fields for + * the mapped file when called with the MS_SYNC or MS_ASYNC flags + * according to the POSIX standard. + * * The application may now run fsync() to * write out the dirty pages and wait on the writeout and check the result. * Or the application may run fadvise(FADV_DONTNEED) against the fd to start @@ -76,14 +81,17 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) break; } file = vma->vm_file; - if ((flags & MS_SYNC) && file && (vma->vm_flags & VM_SHARED)) { - get_file(file); - up_read(&mm->mmap_sem); - error = do_fsync(file, 0); - fput(file); - if (error) - return error; - down_read(&mm->mmap_sem); + if (file && (vma->vm_flags & VM_SHARED)) { + mapping_update_time(file->f_mapping); + if (flags & MS_SYNC) { + get_file(file); + up_read(&mm->mmap_sem); + error = do_fsync(file, 0); + fput(file); + if (error) + return error; + down_read(&mm->mmap_sem); + } } start = vma->vm_end; diff --git a/mm/page-writeback.c b/mm/page-writeback.c index d55cfca..a85df0b 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1025,6 +1025,7 @@ int __set_page_dirty_nobuffers(struct page *page) if (mapping->host) { /* !PageAnon && !swapper_space */ __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + set_bit(AS_MCTIME, &mapping->flags); } return 1; } -- 1.4.4.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] updating ctime and mtime at syncing 2008-01-13 4:39 ` [PATCH 2/2] updating ctime and mtime at syncing Anton Salikhmetov @ 2008-01-13 4:59 ` Rik van Riel 2008-01-14 11:08 ` Miklos Szeredi 1 sibling, 0 replies; 20+ messages in thread From: Rik van Riel @ 2008-01-13 4:59 UTC (permalink / raw) To: Anton Salikhmetov Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb On Sun, 13 Jan 2008 07:39:59 +0300 Anton Salikhmetov <salikhmetov@gmail.com> wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=2645 > > Changes for updating the ctime and mtime fields for memory-mapped files: > > 1) new flag triggering update of the inode data; > 2) new function to update ctime and mtime for block device files; > 3) new helper function to update ctime and mtime when needed; > 4) updating time stamps for mapped files in sys_msync() and do_fsync(); > 5) implementing the feature of auto-updating ctime and mtime. > > Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> -- All rights reversed. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] updating ctime and mtime at syncing 2008-01-13 4:39 ` [PATCH 2/2] updating ctime and mtime at syncing Anton Salikhmetov 2008-01-13 4:59 ` Rik van Riel @ 2008-01-14 11:08 ` Miklos Szeredi 2008-01-14 11:15 ` Miklos Szeredi 2008-01-14 12:22 ` Anton Salikhmetov 1 sibling, 2 replies; 20+ messages in thread From: Miklos Szeredi @ 2008-01-14 11:08 UTC (permalink / raw) To: salikhmetov Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb > http://bugzilla.kernel.org/show_bug.cgi?id=2645 > > Changes for updating the ctime and mtime fields for memory-mapped files: > > 1) new flag triggering update of the inode data; > 2) new function to update ctime and mtime for block device files; > 3) new helper function to update ctime and mtime when needed; > 4) updating time stamps for mapped files in sys_msync() and do_fsync(); > 5) implementing the feature of auto-updating ctime and mtime. How exactly is this done? Is this catering for this case: 1 page is dirtied through mapping 2 app calls msync(MS_ASYNC) 3 page is written again through mapping 4 app calls msync(MS_ASYNC) 5 ... 6 page is written back What happens at 4? Do we care about this one at all? More comments inline. > Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> > --- > fs/buffer.c | 1 + > fs/fs-writeback.c | 2 ++ > fs/inode.c | 42 +++++++++++++++++++++++++++++++++++------- > fs/sync.c | 2 ++ > include/linux/fs.h | 9 ++++++++- > include/linux/pagemap.h | 3 ++- > mm/msync.c | 24 ++++++++++++++++-------- > mm/page-writeback.c | 1 + > 8 files changed, 67 insertions(+), 17 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 7249e01..09adf7e 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -719,6 +719,7 @@ static int __set_page_dirty(struct page *page, > } > write_unlock_irq(&mapping->tree_lock); > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > + set_bit(AS_MCTIME, &mapping->flags); > > return 1; > } > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 0fca820..c25ebd5 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -243,6 +243,8 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc) > > spin_unlock(&inode_lock); > > + mapping_update_time(mapping); > + > ret = do_writepages(mapping, wbc); > > /* Don't write the inode if only I_DIRTY_PAGES was set */ > diff --git a/fs/inode.c b/fs/inode.c > index ed35383..c02bfab 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -1243,8 +1243,8 @@ void touch_atime(struct vfsmount *mnt, struct dentry *dentry) > EXPORT_SYMBOL(touch_atime); > > /** > - * file_update_time - update mtime and ctime time > - * @file: file accessed > + * inode_update_time - update mtime and ctime time > + * @inode: inode accessed > * > * Update the mtime and ctime members of an inode and mark the inode > * for writeback. Note that this function is meant exclusively for > @@ -1253,10 +1253,8 @@ EXPORT_SYMBOL(touch_atime); > * S_NOCTIME inode flag, e.g. for network filesystem where these > * timestamps are handled by the server. > */ > - > -void file_update_time(struct file *file) > +void inode_update_time(struct inode *inode) > { > - struct inode *inode = file->f_path.dentry->d_inode; > struct timespec now; > int sync_it = 0; > > @@ -1279,8 +1277,39 @@ void file_update_time(struct file *file) > if (sync_it) > mark_inode_dirty_sync(inode); > } > +EXPORT_SYMBOL(inode_update_time); > + > +/* > + * Update the ctime and mtime stamps for memory-mapped block device files. > + */ > +static void bd_inode_update_time(struct inode *inode) > +{ > + struct block_device *bdev = inode->i_bdev; > + struct list_head *p; > + > + if (bdev == NULL) > + return; > + > + mutex_lock(&bdev->bd_mutex); > + list_for_each(p, &bdev->bd_inodes) { > + inode = list_entry(p, struct inode, i_devices); > + inode_update_time(inode); > + } > + mutex_unlock(&bdev->bd_mutex); > +} Umm, why not just call with file->f_dentry->d_inode, so that you don't need to do this ugly search for the physical inode? The file pointer is available in both msync and fsync. > -EXPORT_SYMBOL(file_update_time); > +/* > + * Update the ctime and mtime stamps after checking if they are to be updated. > + */ > +void mapping_update_time(struct address_space *mapping) > +{ > + if (test_and_clear_bit(AS_MCTIME, &mapping->flags)) { > + if (S_ISBLK(mapping->host->i_mode)) > + bd_inode_update_time(mapping->host); > + else > + inode_update_time(mapping->host); > + } > +} > > int inode_needs_sync(struct inode *inode) > { > @@ -1290,7 +1319,6 @@ int inode_needs_sync(struct inode *inode) > return 1; > return 0; > } > - > EXPORT_SYMBOL(inode_needs_sync); > > int inode_wait(void *word) > diff --git a/fs/sync.c b/fs/sync.c > index 7cd005e..5561464 100644 > --- a/fs/sync.c > +++ b/fs/sync.c > @@ -87,6 +87,8 @@ long do_fsync(struct file *file, int datasync) > goto out; > } > > + mapping_update_time(mapping); > + > ret = filemap_fdatawrite(mapping); > > /* > diff --git a/include/linux/fs.h b/include/linux/fs.h > index b3ec4a4..1dccd4b 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1977,7 +1977,14 @@ extern int buffer_migrate_page(struct address_space *, > extern int inode_change_ok(struct inode *, struct iattr *); > extern int __must_check inode_setattr(struct inode *, struct iattr *); > > -extern void file_update_time(struct file *file); > +extern void inode_update_time(struct inode *); > + > +static inline void file_update_time(struct file *file) > +{ > + inode_update_time(file->f_path.dentry->d_inode); > +} > + > +extern void mapping_update_time(struct address_space *); > > static inline ino_t parent_ino(struct dentry *dentry) > { > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index db8a410..bf0f9e7 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -17,8 +17,9 @@ > * Bits in mapping->flags. The lower __GFP_BITS_SHIFT bits are the page > * allocation mode flags. > */ > -#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ > +#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ > #define AS_ENOSPC (__GFP_BITS_SHIFT + 1) /* ENOSPC on async write */ > +#define AS_MCTIME (__GFP_BITS_SHIFT + 2) /* mtime and ctime to update */ > > static inline void mapping_set_error(struct address_space *mapping, int error) > { > diff --git a/mm/msync.c b/mm/msync.c > index ff654c9..07dc8fc 100644 > --- a/mm/msync.c > +++ b/mm/msync.c > @@ -5,6 +5,7 @@ > * Copyright (C) 1994-1999 Linus Torvalds > * > * Substantial code cleanup. > + * Updating the ctime and mtime stamps for memory-mapped files. > * Copyright (C) 2008 Anton Salikhmetov <salikhmetov@gmail.com> > */ > > @@ -22,6 +23,10 @@ > * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). > * Now it doesn't do anything, since dirty pages are properly tracked. > * > + * The msync() system call updates the ctime and mtime fields for > + * the mapped file when called with the MS_SYNC or MS_ASYNC flags > + * according to the POSIX standard. > + * > * The application may now run fsync() to > * write out the dirty pages and wait on the writeout and check the result. > * Or the application may run fadvise(FADV_DONTNEED) against the fd to start > @@ -76,14 +81,17 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) > break; > } > file = vma->vm_file; > - if ((flags & MS_SYNC) && file && (vma->vm_flags & VM_SHARED)) { > - get_file(file); > - up_read(&mm->mmap_sem); > - error = do_fsync(file, 0); > - fput(file); > - if (error) > - return error; > - down_read(&mm->mmap_sem); > + if (file && (vma->vm_flags & VM_SHARED)) { > + mapping_update_time(file->f_mapping); > + if (flags & MS_SYNC) { > + get_file(file); > + up_read(&mm->mmap_sem); > + error = do_fsync(file, 0); > + fput(file); > + if (error) > + return error; > + down_read(&mm->mmap_sem); > + } > } > > start = vma->vm_end; > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index d55cfca..a85df0b 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1025,6 +1025,7 @@ int __set_page_dirty_nobuffers(struct page *page) > if (mapping->host) { > /* !PageAnon && !swapper_space */ > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > + set_bit(AS_MCTIME, &mapping->flags); > } > return 1; > } > -- > 1.4.4.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] updating ctime and mtime at syncing 2008-01-14 11:08 ` Miklos Szeredi @ 2008-01-14 11:15 ` Miklos Szeredi 2008-01-14 12:25 ` Anton Salikhmetov 2008-01-14 12:22 ` Anton Salikhmetov 1 sibling, 1 reply; 20+ messages in thread From: Miklos Szeredi @ 2008-01-14 11:15 UTC (permalink / raw) To: salikhmetov Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb > > http://bugzilla.kernel.org/show_bug.cgi?id=2645 > > > > Changes for updating the ctime and mtime fields for memory-mapped files: > > > > 1) new flag triggering update of the inode data; > > 2) new function to update ctime and mtime for block device files; > > 3) new helper function to update ctime and mtime when needed; > > 4) updating time stamps for mapped files in sys_msync() and do_fsync(); > > 5) implementing the feature of auto-updating ctime and mtime. > > How exactly is this done? > > Is this catering for this case: > > 1 page is dirtied through mapping > 2 app calls msync(MS_ASYNC) > 3 page is written again through mapping > 4 app calls msync(MS_ASYNC) > 5 ... > 6 page is written back > > What happens at 4? Do we care about this one at all? Oh, and here's a test program I wrote, that can be used to check this behavior. It has two options: -s use MS_SYNC instead of MS_ASYNC -f fork and do the msync on a different mapping Back then I haven't found a single OS, that fully conformed to all the stupid POSIX rules regarding mmaps and ctime/mtime. Miklos ---- #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <fcntl.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/wait.h> static const char *filename; static int msync_flag = MS_ASYNC; static int msync_fork = 0; static void print_times(const char *msg) { struct stat stbuf; stat(filename, &stbuf); printf("%s\t%li\t%li\t%li\n", msg, stbuf.st_ctime, stbuf.st_mtime, stbuf.st_atime); } static void do_msync(void *addr, int len) { int res; if (!msync_fork) { res = msync(addr, len, msync_flag); if (res == -1) { perror("msync"); exit(1); } } else { int pid = fork(); if (pid == -1) { perror("fork"); exit(1); } if (!pid) { int fd = open(filename, O_RDWR); if (fd == -1) { perror("open"); exit(1); } addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (addr == MAP_FAILED) { perror("mmap"); exit(1); } res = msync(addr, len, msync_flag); if (res == -1) { perror("msync"); exit(1); } exit(0); } wait(NULL); } } static void usage(const char *progname) { fprintf(stderr, "usage: %s filename [-sf]\n", progname); exit(1); } int main(int argc, char *argv[]) { int res; char *addr; int fd; if (argc < 2) usage(argv[0]); filename = argv[1]; if (argc > 2) { if (argc > 3) usage(argv[0]); if (strcmp(argv[2], "-s") == 0) msync_flag = MS_SYNC; else if (strcmp(argv[2], "-f") == 0) msync_fork = 1; else if (strcmp(argv[2], "-sf") == 0 || strcmp(argv[2], "-fs") == 0) { msync_flag = MS_SYNC; msync_fork = 1; } else usage(argv[0]); } fd = open(filename, O_RDWR | O_TRUNC | O_CREAT, 0666); if (fd == -1) { perror(filename); return 1; } print_times("begin"); sleep(1); write(fd, "aaaa\n", 4); print_times("write"); sleep(1); addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (addr == MAP_FAILED) { perror("mmap"); return 1; } print_times("mmap"); sleep(1); addr[1] = 'b'; print_times("b"); sleep(1); do_msync(addr, 4); print_times("msync b"); sleep(1); addr[2] = 'c'; print_times("c"); sleep(1); do_msync(addr, 4); print_times("msync c"); sleep(1); addr[3] = 'd'; print_times("d"); sleep(1); res = munmap(addr, 4); if (res == -1) { perror("munmap"); return 1; } print_times("munmap"); sleep(1); res = close(fd); if (res == -1) { perror("close"); return 1; } print_times("close"); sleep(1); sync(); print_times("sync"); return 0; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] updating ctime and mtime at syncing 2008-01-14 11:15 ` Miklos Szeredi @ 2008-01-14 12:25 ` Anton Salikhmetov 0 siblings, 0 replies; 20+ messages in thread From: Anton Salikhmetov @ 2008-01-14 12:25 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb 2008/1/14, Miklos Szeredi <miklos@szeredi.hu>: > > > http://bugzilla.kernel.org/show_bug.cgi?id=2645 > > > > > > Changes for updating the ctime and mtime fields for memory-mapped files: > > > > > > 1) new flag triggering update of the inode data; > > > 2) new function to update ctime and mtime for block device files; > > > 3) new helper function to update ctime and mtime when needed; > > > 4) updating time stamps for mapped files in sys_msync() and do_fsync(); > > > 5) implementing the feature of auto-updating ctime and mtime. > > > > How exactly is this done? > > > > Is this catering for this case: > > > > 1 page is dirtied through mapping > > 2 app calls msync(MS_ASYNC) > > 3 page is written again through mapping > > 4 app calls msync(MS_ASYNC) > > 5 ... > > 6 page is written back > > > > What happens at 4? Do we care about this one at all? > > Oh, and here's a test program I wrote, that can be used to check this > behavior. It has two options: > > -s use MS_SYNC instead of MS_ASYNC > -f fork and do the msync on a different mapping > > Back then I haven't found a single OS, that fully conformed to all the > stupid POSIX rules regarding mmaps and ctime/mtime. Thank you very much for sharing your code. I'll integrate the MS_ASYNC and fork() test cases into my own unit test. > > Miklos > ---- > > #include <stdio.h> > #include <stdlib.h> > #include <unistd.h> > #include <string.h> > #include <fcntl.h> > #include <sys/mman.h> > #include <sys/stat.h> > #include <sys/wait.h> > > static const char *filename; > static int msync_flag = MS_ASYNC; > static int msync_fork = 0; > > static void print_times(const char *msg) > { > struct stat stbuf; > stat(filename, &stbuf); > printf("%s\t%li\t%li\t%li\n", msg, stbuf.st_ctime, stbuf.st_mtime, > stbuf.st_atime); > } > > static void do_msync(void *addr, int len) > { > int res; > if (!msync_fork) { > res = msync(addr, len, msync_flag); > if (res == -1) { > perror("msync"); > exit(1); > } > } else { > int pid = fork(); > if (pid == -1) { > perror("fork"); > exit(1); > } > if (!pid) { > int fd = open(filename, O_RDWR); > if (fd == -1) { > perror("open"); > exit(1); > } > addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > if (addr == MAP_FAILED) { > perror("mmap"); > exit(1); > } > res = msync(addr, len, msync_flag); > if (res == -1) { > perror("msync"); > exit(1); > } > exit(0); > } > wait(NULL); > } > } > > static void usage(const char *progname) > { > fprintf(stderr, "usage: %s filename [-sf]\n", progname); > exit(1); > } > > int main(int argc, char *argv[]) > { > int res; > char *addr; > int fd; > > if (argc < 2) > usage(argv[0]); > > filename = argv[1]; > if (argc > 2) { > if (argc > 3) > usage(argv[0]); > if (strcmp(argv[2], "-s") == 0) > msync_flag = MS_SYNC; > else if (strcmp(argv[2], "-f") == 0) > msync_fork = 1; > else if (strcmp(argv[2], "-sf") == 0 || strcmp(argv[2], "-fs") == 0) { > msync_flag = MS_SYNC; > msync_fork = 1; > } else > usage(argv[0]); > } > > fd = open(filename, O_RDWR | O_TRUNC | O_CREAT, 0666); > if (fd == -1) { > perror(filename); > return 1; > } > print_times("begin"); > sleep(1); > write(fd, "aaaa\n", 4); > print_times("write"); > sleep(1); > addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > if (addr == MAP_FAILED) { > perror("mmap"); > return 1; > } > print_times("mmap"); > sleep(1); > > addr[1] = 'b'; > print_times("b"); > sleep(1); > do_msync(addr, 4); > print_times("msync b"); > sleep(1); > > addr[2] = 'c'; > print_times("c"); > sleep(1); > do_msync(addr, 4); > print_times("msync c"); > sleep(1); > > addr[3] = 'd'; > print_times("d"); > sleep(1); > res = munmap(addr, 4); > if (res == -1) { > perror("munmap"); > return 1; > } > print_times("munmap"); > sleep(1); > > res = close(fd); > if (res == -1) { > perror("close"); > return 1; > } > print_times("close"); > sleep(1); > sync(); > print_times("sync"); > > return 0; > } > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] updating ctime and mtime at syncing 2008-01-14 11:08 ` Miklos Szeredi 2008-01-14 11:15 ` Miklos Szeredi @ 2008-01-14 12:22 ` Anton Salikhmetov 2008-01-14 13:14 ` Miklos Szeredi 1 sibling, 1 reply; 20+ messages in thread From: Anton Salikhmetov @ 2008-01-14 12:22 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb 2008/1/14, Miklos Szeredi <miklos@szeredi.hu>: > > http://bugzilla.kernel.org/show_bug.cgi?id=2645 > > > > Changes for updating the ctime and mtime fields for memory-mapped files: > > > > 1) new flag triggering update of the inode data; > > 2) new function to update ctime and mtime for block device files; > > 3) new helper function to update ctime and mtime when needed; > > 4) updating time stamps for mapped files in sys_msync() and do_fsync(); > > 5) implementing the feature of auto-updating ctime and mtime. > > How exactly is this done? > > Is this catering for this case: > > 1 page is dirtied through mapping > 2 app calls msync(MS_ASYNC) > 3 page is written again through mapping > 4 app calls msync(MS_ASYNC) > 5 ... > 6 page is written back > > What happens at 4? Do we care about this one at all? The POSIX standard requires updating the file times every time when msync() is called with MS_ASYNC. I.e. the time stamps should be updated even when no physical synchronization is being done immediately. At least, this is how I undestand the standard. Please tell me if I am wrong. > > More comments inline. > > > Signed-off-by: Anton Salikhmetov <salikhmetov@gmail.com> > > --- > > fs/buffer.c | 1 + > > fs/fs-writeback.c | 2 ++ > > fs/inode.c | 42 +++++++++++++++++++++++++++++++++++------- > > fs/sync.c | 2 ++ > > include/linux/fs.h | 9 ++++++++- > > include/linux/pagemap.h | 3 ++- > > mm/msync.c | 24 ++++++++++++++++-------- > > mm/page-writeback.c | 1 + > > 8 files changed, 67 insertions(+), 17 deletions(-) > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > index 7249e01..09adf7e 100644 > > --- a/fs/buffer.c > > +++ b/fs/buffer.c > > @@ -719,6 +719,7 @@ static int __set_page_dirty(struct page *page, > > } > > write_unlock_irq(&mapping->tree_lock); > > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > + set_bit(AS_MCTIME, &mapping->flags); > > > > return 1; > > } > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > > index 0fca820..c25ebd5 100644 > > --- a/fs/fs-writeback.c > > +++ b/fs/fs-writeback.c > > @@ -243,6 +243,8 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc) > > > > spin_unlock(&inode_lock); > > > > + mapping_update_time(mapping); > > + > > ret = do_writepages(mapping, wbc); > > > > /* Don't write the inode if only I_DIRTY_PAGES was set */ > > diff --git a/fs/inode.c b/fs/inode.c > > index ed35383..c02bfab 100644 > > --- a/fs/inode.c > > +++ b/fs/inode.c > > @@ -1243,8 +1243,8 @@ void touch_atime(struct vfsmount *mnt, struct dentry *dentry) > > EXPORT_SYMBOL(touch_atime); > > > > /** > > - * file_update_time - update mtime and ctime time > > - * @file: file accessed > > + * inode_update_time - update mtime and ctime time > > + * @inode: inode accessed > > * > > * Update the mtime and ctime members of an inode and mark the inode > > * for writeback. Note that this function is meant exclusively for > > @@ -1253,10 +1253,8 @@ EXPORT_SYMBOL(touch_atime); > > * S_NOCTIME inode flag, e.g. for network filesystem where these > > * timestamps are handled by the server. > > */ > > - > > -void file_update_time(struct file *file) > > +void inode_update_time(struct inode *inode) > > { > > - struct inode *inode = file->f_path.dentry->d_inode; > > struct timespec now; > > int sync_it = 0; > > > > @@ -1279,8 +1277,39 @@ void file_update_time(struct file *file) > > if (sync_it) > > mark_inode_dirty_sync(inode); > > } > > +EXPORT_SYMBOL(inode_update_time); > > + > > +/* > > + * Update the ctime and mtime stamps for memory-mapped block device files. > > + */ > > +static void bd_inode_update_time(struct inode *inode) > > +{ > > + struct block_device *bdev = inode->i_bdev; > > + struct list_head *p; > > + > > + if (bdev == NULL) > > + return; > > + > > + mutex_lock(&bdev->bd_mutex); > > + list_for_each(p, &bdev->bd_inodes) { > > + inode = list_entry(p, struct inode, i_devices); > > + inode_update_time(inode); > > + } > > + mutex_unlock(&bdev->bd_mutex); > > +} > > Umm, why not just call with file->f_dentry->d_inode, so that you don't > need to do this ugly search for the physical inode? The file pointer > is available in both msync and fsync. I'm not sure if I undestood your question. I see two possible interpretations for this question, and I'm answering both. The intention was to make the data changes in the block device data visible to all device files associated with the block device. Hence the search, because the time stamps for all such device files should be updated as well. Not only the sys_msync() and do_fsync() routines call the helper function mapping_update_time(). Not all call sites of the helper function have the file pointer available. Therefore, I pass the mapping pointer to the helper function in order to make this function adequate for all situations. > > > -EXPORT_SYMBOL(file_update_time); > > +/* > > + * Update the ctime and mtime stamps after checking if they are to be updated. > > + */ > > +void mapping_update_time(struct address_space *mapping) > > +{ > > + if (test_and_clear_bit(AS_MCTIME, &mapping->flags)) { > > + if (S_ISBLK(mapping->host->i_mode)) > > + bd_inode_update_time(mapping->host); > > + else > > + inode_update_time(mapping->host); > > + } > > +} > > > > int inode_needs_sync(struct inode *inode) > > { > > @@ -1290,7 +1319,6 @@ int inode_needs_sync(struct inode *inode) > > return 1; > > return 0; > > } > > - > > EXPORT_SYMBOL(inode_needs_sync); > > > > int inode_wait(void *word) > > diff --git a/fs/sync.c b/fs/sync.c > > index 7cd005e..5561464 100644 > > --- a/fs/sync.c > > +++ b/fs/sync.c > > @@ -87,6 +87,8 @@ long do_fsync(struct file *file, int datasync) > > goto out; > > } > > > > + mapping_update_time(mapping); > > + > > ret = filemap_fdatawrite(mapping); > > > > /* > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index b3ec4a4..1dccd4b 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -1977,7 +1977,14 @@ extern int buffer_migrate_page(struct address_space *, > > extern int inode_change_ok(struct inode *, struct iattr *); > > extern int __must_check inode_setattr(struct inode *, struct iattr *); > > > > -extern void file_update_time(struct file *file); > > +extern void inode_update_time(struct inode *); > > + > > +static inline void file_update_time(struct file *file) > > +{ > > + inode_update_time(file->f_path.dentry->d_inode); > > +} > > + > > +extern void mapping_update_time(struct address_space *); > > > > static inline ino_t parent_ino(struct dentry *dentry) > > { > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > > index db8a410..bf0f9e7 100644 > > --- a/include/linux/pagemap.h > > +++ b/include/linux/pagemap.h > > @@ -17,8 +17,9 @@ > > * Bits in mapping->flags. The lower __GFP_BITS_SHIFT bits are the page > > * allocation mode flags. > > */ > > -#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ > > +#define AS_EIO (__GFP_BITS_SHIFT + 0) /* IO error on async write */ > > #define AS_ENOSPC (__GFP_BITS_SHIFT + 1) /* ENOSPC on async write */ > > +#define AS_MCTIME (__GFP_BITS_SHIFT + 2) /* mtime and ctime to update */ > > > > static inline void mapping_set_error(struct address_space *mapping, int error) > > { > > diff --git a/mm/msync.c b/mm/msync.c > > index ff654c9..07dc8fc 100644 > > --- a/mm/msync.c > > +++ b/mm/msync.c > > @@ -5,6 +5,7 @@ > > * Copyright (C) 1994-1999 Linus Torvalds > > * > > * Substantial code cleanup. > > + * Updating the ctime and mtime stamps for memory-mapped files. > > * Copyright (C) 2008 Anton Salikhmetov <salikhmetov@gmail.com> > > */ > > > > @@ -22,6 +23,10 @@ > > * Nor does it mark the relevant pages dirty (it used to up to 2.6.17). > > * Now it doesn't do anything, since dirty pages are properly tracked. > > * > > + * The msync() system call updates the ctime and mtime fields for > > + * the mapped file when called with the MS_SYNC or MS_ASYNC flags > > + * according to the POSIX standard. > > + * > > * The application may now run fsync() to > > * write out the dirty pages and wait on the writeout and check the result. > > * Or the application may run fadvise(FADV_DONTNEED) against the fd to start > > @@ -76,14 +81,17 @@ asmlinkage long sys_msync(unsigned long start, size_t len, int flags) > > break; > > } > > file = vma->vm_file; > > - if ((flags & MS_SYNC) && file && (vma->vm_flags & VM_SHARED)) { > > - get_file(file); > > - up_read(&mm->mmap_sem); > > - error = do_fsync(file, 0); > > - fput(file); > > - if (error) > > - return error; > > - down_read(&mm->mmap_sem); > > + if (file && (vma->vm_flags & VM_SHARED)) { > > + mapping_update_time(file->f_mapping); > > + if (flags & MS_SYNC) { > > + get_file(file); > > + up_read(&mm->mmap_sem); > > + error = do_fsync(file, 0); > > + fput(file); > > + if (error) > > + return error; > > + down_read(&mm->mmap_sem); > > + } > > } > > > > start = vma->vm_end; > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index d55cfca..a85df0b 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -1025,6 +1025,7 @@ int __set_page_dirty_nobuffers(struct page *page) > > if (mapping->host) { > > /* !PageAnon && !swapper_space */ > > __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); > > + set_bit(AS_MCTIME, &mapping->flags); > > } > > return 1; > > } > > -- > > 1.4.4.4 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] updating ctime and mtime at syncing 2008-01-14 12:22 ` Anton Salikhmetov @ 2008-01-14 13:14 ` Miklos Szeredi 2008-01-14 13:35 ` Peter Zijlstra 2008-01-14 18:59 ` Anton Salikhmetov 0 siblings, 2 replies; 20+ messages in thread From: Miklos Szeredi @ 2008-01-14 13:14 UTC (permalink / raw) To: salikhmetov Cc: miklos, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb > 2008/1/14, Miklos Szeredi <miklos@szeredi.hu>: > > > http://bugzilla.kernel.org/show_bug.cgi?id=2645 > > > > > > Changes for updating the ctime and mtime fields for memory-mapped files: > > > > > > 1) new flag triggering update of the inode data; > > > 2) new function to update ctime and mtime for block device files; > > > 3) new helper function to update ctime and mtime when needed; > > > 4) updating time stamps for mapped files in sys_msync() and do_fsync(); > > > 5) implementing the feature of auto-updating ctime and mtime. > > > > How exactly is this done? > > > > Is this catering for this case: > > > > 1 page is dirtied through mapping > > 2 app calls msync(MS_ASYNC) > > 3 page is written again through mapping > > 4 app calls msync(MS_ASYNC) > > 5 ... > > 6 page is written back > > > > What happens at 4? Do we care about this one at all? > > The POSIX standard requires updating the file times every time when msync() > is called with MS_ASYNC. I.e. the time stamps should be updated even > when no physical synchronization is being done immediately. Yes. However, on linux MS_ASYNC is basically a no-op, and without doing _something_ with the dirty pages (which afaics your patch doesn't do), it's impossible to observe later writes to the same page. I don't advocate full POSIX conformance anymore, because it's probably too expensive to do (I've tried). Rather than that, we should probably find some sane compromise, that just fixes the real life issue. Here's a pointer to the thread about this: http://lkml.org/lkml/2007/3/27/55 Your patch may be a good soultion, but you should describe in detail what it does when pages are dirtied, and when msync/fsync are called, and what happens with multiple msync calls that I've asked about. I suspect your patch is ignoring writes after the first msync, but then why care about msync at all? What's so special about the _first_ msync? Is it just that most test programs only check this, and not what happens if msync is called more than once? That would be a bug in the test cases. > > > +/* > > > + * Update the ctime and mtime stamps for memory-mapped block device files. > > > + */ > > > +static void bd_inode_update_time(struct inode *inode) > > > +{ > > > + struct block_device *bdev = inode->i_bdev; > > > + struct list_head *p; > > > + > > > + if (bdev == NULL) > > > + return; > > > + > > > + mutex_lock(&bdev->bd_mutex); > > > + list_for_each(p, &bdev->bd_inodes) { > > > + inode = list_entry(p, struct inode, i_devices); > > > + inode_update_time(inode); > > > + } > > > + mutex_unlock(&bdev->bd_mutex); > > > +} > > > > Umm, why not just call with file->f_dentry->d_inode, so that you don't > > need to do this ugly search for the physical inode? The file pointer > > is available in both msync and fsync. > > I'm not sure if I undestood your question. I see two possible > interpretations for this question, and I'm answering both. > > The intention was to make the data changes in the block device data > visible to all device files associated with the block device. Hence > the search, because the time stamps for all such device files should > be updated as well. Ahh, but it will only update "active" devices, which are currently open, no? Is that what we want? > Not only the sys_msync() and do_fsync() routines call the helper > function mapping_update_time(). Ah yes, __sync_single_inode() calls it as well. Why? Miklos -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] updating ctime and mtime at syncing 2008-01-14 13:14 ` Miklos Szeredi @ 2008-01-14 13:35 ` Peter Zijlstra 2008-01-14 13:39 ` Peter Zijlstra 2008-01-14 14:17 ` Anton Salikhmetov 2008-01-14 18:59 ` Anton Salikhmetov 1 sibling, 2 replies; 20+ messages in thread From: Peter Zijlstra @ 2008-01-14 13:35 UTC (permalink / raw) To: Miklos Szeredi Cc: salikhmetov, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, akpm, protasnb On Mon, 2008-01-14 at 14:14 +0100, Miklos Szeredi wrote: > > 2008/1/14, Miklos Szeredi <miklos@szeredi.hu>: > > > > http://bugzilla.kernel.org/show_bug.cgi?id=2645 > > > > > > > > Changes for updating the ctime and mtime fields for memory-mapped files: > > > > > > > > 1) new flag triggering update of the inode data; > > > > 2) new function to update ctime and mtime for block device files; > > > > 3) new helper function to update ctime and mtime when needed; > > > > 4) updating time stamps for mapped files in sys_msync() and do_fsync(); > > > > 5) implementing the feature of auto-updating ctime and mtime. > > > > > > How exactly is this done? > > > > > > Is this catering for this case: > > > > > > 1 page is dirtied through mapping > > > 2 app calls msync(MS_ASYNC) > > > 3 page is written again through mapping > > > 4 app calls msync(MS_ASYNC) > > > 5 ... > > > 6 page is written back > > > > > > What happens at 4? Do we care about this one at all? > > > > The POSIX standard requires updating the file times every time when msync() > > is called with MS_ASYNC. I.e. the time stamps should be updated even > > when no physical synchronization is being done immediately. > > Yes. However, on linux MS_ASYNC is basically a no-op, and without > doing _something_ with the dirty pages (which afaics your patch > doesn't do), it's impossible to observe later writes to the same page. > > I don't advocate full POSIX conformance anymore, because it's probably > too expensive to do (I've tried). Rather than that, we should > probably find some sane compromise, that just fixes the real life > issue. Here's a pointer to the thread about this: > > http://lkml.org/lkml/2007/3/27/55 > > Your patch may be a good soultion, but you should describe in detail > what it does when pages are dirtied, and when msync/fsync are called, > and what happens with multiple msync calls that I've asked about. > > I suspect your patch is ignoring writes after the first msync, but > then why care about msync at all? What's so special about the _first_ > msync? Is it just that most test programs only check this, and not > what happens if msync is called more than once? That would be a bug > in the test cases. I must agree, doing the mmap dirty, MS_ASYNC, mmap retouch, MS_ASYNC case correctly would need a lot more code which I doubt is worth the effort. It would require scanning the PTEs and marking them read-only again on MS_ASYNC, and some more logic in set_page_dirty() because that currently bails out early if the page in question is already dirty. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] updating ctime and mtime at syncing 2008-01-14 13:35 ` Peter Zijlstra @ 2008-01-14 13:39 ` Peter Zijlstra 2008-01-14 13:45 ` Miklos Szeredi 2008-01-14 14:17 ` Anton Salikhmetov 1 sibling, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2008-01-14 13:39 UTC (permalink / raw) To: Miklos Szeredi Cc: salikhmetov, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, akpm, protasnb, Nick Piggin On Mon, 2008-01-14 at 14:35 +0100, Peter Zijlstra wrote: > On Mon, 2008-01-14 at 14:14 +0100, Miklos Szeredi wrote: > > > 2008/1/14, Miklos Szeredi <miklos@szeredi.hu>: > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=2645 > > > > > > > > > > Changes for updating the ctime and mtime fields for memory-mapped files: > > > > > > > > > > 1) new flag triggering update of the inode data; > > > > > 2) new function to update ctime and mtime for block device files; > > > > > 3) new helper function to update ctime and mtime when needed; > > > > > 4) updating time stamps for mapped files in sys_msync() and do_fsync(); > > > > > 5) implementing the feature of auto-updating ctime and mtime. > > > > > > > > How exactly is this done? > > > > > > > > Is this catering for this case: > > > > > > > > 1 page is dirtied through mapping > > > > 2 app calls msync(MS_ASYNC) > > > > 3 page is written again through mapping > > > > 4 app calls msync(MS_ASYNC) > > > > 5 ... > > > > 6 page is written back > > > > > > > > What happens at 4? Do we care about this one at all? > > > > > > The POSIX standard requires updating the file times every time when msync() > > > is called with MS_ASYNC. I.e. the time stamps should be updated even > > > when no physical synchronization is being done immediately. > > > > Yes. However, on linux MS_ASYNC is basically a no-op, and without > > doing _something_ with the dirty pages (which afaics your patch > > doesn't do), it's impossible to observe later writes to the same page. > > > > I don't advocate full POSIX conformance anymore, because it's probably > > too expensive to do (I've tried). Rather than that, we should > > probably find some sane compromise, that just fixes the real life > > issue. Here's a pointer to the thread about this: > > > > http://lkml.org/lkml/2007/3/27/55 > > > > Your patch may be a good soultion, but you should describe in detail > > what it does when pages are dirtied, and when msync/fsync are called, > > and what happens with multiple msync calls that I've asked about. > > > > I suspect your patch is ignoring writes after the first msync, but > > then why care about msync at all? What's so special about the _first_ > > msync? Is it just that most test programs only check this, and not > > what happens if msync is called more than once? That would be a bug > > in the test cases. > > I must agree, doing the mmap dirty, MS_ASYNC, mmap retouch, MS_ASYNC > case correctly would need a lot more code which I doubt is worth the > effort. > > It would require scanning the PTEs and marking them read-only again on > MS_ASYNC, and some more logic in set_page_dirty() because that currently > bails out early if the page in question is already dirty. More fun, it would require marking them RO but leaving the dirty bit set, because this ext3 fudge where we confuse the page dirty state - or did that get fixed? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] updating ctime and mtime at syncing 2008-01-14 13:39 ` Peter Zijlstra @ 2008-01-14 13:45 ` Miklos Szeredi 2008-01-14 13:47 ` Miklos Szeredi 0 siblings, 1 reply; 20+ messages in thread From: Miklos Szeredi @ 2008-01-14 13:45 UTC (permalink / raw) To: peterz Cc: salikhmetov, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, akpm, protasnb, nickpiggin > More fun, it would require marking them RO but leaving the dirty bit > set, because this ext3 fudge where we confuse the page dirty state - or > did that get fixed? That got fixed by Nick, I think. The alternative to marking pages RO, is to walk the PTEs in MS_ASYNC, note the dirty bit and mark pages clean. But it's possibly even more complicated. Miklos -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] updating ctime and mtime at syncing 2008-01-14 13:45 ` Miklos Szeredi @ 2008-01-14 13:47 ` Miklos Szeredi 0 siblings, 0 replies; 20+ messages in thread From: Miklos Szeredi @ 2008-01-14 13:47 UTC (permalink / raw) To: miklos Cc: peterz, salikhmetov, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, akpm, protasnb, nickpiggin > > More fun, it would require marking them RO but leaving the dirty bit > > set, because this ext3 fudge where we confuse the page dirty state - or > > did that get fixed? > > That got fixed by Nick, I think. > > The alternative to marking pages RO, is to walk the PTEs in MS_ASYNC, > note the dirty bit and mark pages clean. But it's possibly even more ^^^^ ptes, I mean -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] updating ctime and mtime at syncing 2008-01-14 13:35 ` Peter Zijlstra 2008-01-14 13:39 ` Peter Zijlstra @ 2008-01-14 14:17 ` Anton Salikhmetov 2008-01-15 9:53 ` Miklos Szeredi 1 sibling, 1 reply; 20+ messages in thread From: Anton Salikhmetov @ 2008-01-14 14:17 UTC (permalink / raw) To: Peter Zijlstra Cc: Miklos Szeredi, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, akpm, protasnb 2008/1/14, Peter Zijlstra <a.p.zijlstra@chello.nl>: > > On Mon, 2008-01-14 at 14:14 +0100, Miklos Szeredi wrote: > > > 2008/1/14, Miklos Szeredi <miklos@szeredi.hu>: > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=2645 > > > > > > > > > > Changes for updating the ctime and mtime fields for memory-mapped files: > > > > > > > > > > 1) new flag triggering update of the inode data; > > > > > 2) new function to update ctime and mtime for block device files; > > > > > 3) new helper function to update ctime and mtime when needed; > > > > > 4) updating time stamps for mapped files in sys_msync() and do_fsync(); > > > > > 5) implementing the feature of auto-updating ctime and mtime. > > > > > > > > How exactly is this done? > > > > > > > > Is this catering for this case: > > > > > > > > 1 page is dirtied through mapping > > > > 2 app calls msync(MS_ASYNC) > > > > 3 page is written again through mapping > > > > 4 app calls msync(MS_ASYNC) > > > > 5 ... > > > > 6 page is written back > > > > > > > > What happens at 4? Do we care about this one at all? > > > > > > The POSIX standard requires updating the file times every time when msync() > > > is called with MS_ASYNC. I.e. the time stamps should be updated even > > > when no physical synchronization is being done immediately. > > > > Yes. However, on linux MS_ASYNC is basically a no-op, and without > > doing _something_ with the dirty pages (which afaics your patch > > doesn't do), it's impossible to observe later writes to the same page. > > > > I don't advocate full POSIX conformance anymore, because it's probably > > too expensive to do (I've tried). Rather than that, we should > > probably find some sane compromise, that just fixes the real life > > issue. Here's a pointer to the thread about this: > > > > http://lkml.org/lkml/2007/3/27/55 > > > > Your patch may be a good soultion, but you should describe in detail > > what it does when pages are dirtied, and when msync/fsync are called, > > and what happens with multiple msync calls that I've asked about. > > > > I suspect your patch is ignoring writes after the first msync, but > > then why care about msync at all? What's so special about the _first_ > > msync? Is it just that most test programs only check this, and not > > what happens if msync is called more than once? That would be a bug > > in the test cases. > > I must agree, doing the mmap dirty, MS_ASYNC, mmap retouch, MS_ASYNC > case correctly would need a lot more code which I doubt is worth the > effort. > > It would require scanning the PTEs and marking them read-only again on > MS_ASYNC, and some more logic in set_page_dirty() because that currently > bails out early if the page in question is already dirty. Thanks for your review, Peter and Miklos! I overlooked this case when AS_MCTIME flag has been turned off and the page is still dirty. On the other hand, the words "shall be marked for update" may be considered as just setting the AS_MCTIME flag, not updating the time stamps. What do you think about calling mapping_update_time() inside of "if (MS_SYNC & flags)"? I suggest such change because the code for analysis of the case you've mentioned above seems impossible to me. > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] updating ctime and mtime at syncing 2008-01-14 14:17 ` Anton Salikhmetov @ 2008-01-15 9:53 ` Miklos Szeredi 2008-01-15 10:46 ` Anton Salikhmetov 0 siblings, 1 reply; 20+ messages in thread From: Miklos Szeredi @ 2008-01-15 9:53 UTC (permalink / raw) To: salikhmetov Cc: a.p.zijlstra, miklos, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, akpm, protasnb > Thanks for your review, Peter and Miklos! > > I overlooked this case when AS_MCTIME flag has been turned off and the > page is still dirty. > > On the other hand, the words "shall be marked for update" may be > considered as just setting the AS_MCTIME flag, not updating the time > stamps. > > What do you think about calling mapping_update_time() inside of "if > (MS_SYNC & flags)"? I suggest such change because the code for > analysis of the case you've mentioned above seems impossible to me. I think that's a good idea. As a first iteration, just updating the mtime/ctime in msync(MS_SYNC) and remove_vma() (called at munmap time) would be a big improvement over what we currently have. I would also recommend, that you drop mapping_update_time() and the related functions from the patch, and just use file_update_time() instead. Miklos -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] updating ctime and mtime at syncing 2008-01-15 9:53 ` Miklos Szeredi @ 2008-01-15 10:46 ` Anton Salikhmetov 0 siblings, 0 replies; 20+ messages in thread From: Anton Salikhmetov @ 2008-01-15 10:46 UTC (permalink / raw) To: Miklos Szeredi Cc: a.p.zijlstra, linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, akpm, protasnb 2008/1/15, Miklos Szeredi <miklos@szeredi.hu>: > > Thanks for your review, Peter and Miklos! > > > > I overlooked this case when AS_MCTIME flag has been turned off and the > > page is still dirty. > > > > On the other hand, the words "shall be marked for update" may be > > considered as just setting the AS_MCTIME flag, not updating the time > > stamps. > > > > What do you think about calling mapping_update_time() inside of "if > > (MS_SYNC & flags)"? I suggest such change because the code for > > analysis of the case you've mentioned above seems impossible to me. > > I think that's a good idea. As a first iteration, just updating the > mtime/ctime in msync(MS_SYNC) and remove_vma() (called at munmap time) > would be a big improvement over what we currently have. > > I would also recommend, that you drop mapping_update_time() and the > related functions from the patch, and just use file_update_time() > instead. Thank you for your recommendations. I will submit my new solution shortly. By the way, I've already changed the unlink_file_vma() function. > > Miklos > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] updating ctime and mtime at syncing 2008-01-14 13:14 ` Miklos Szeredi 2008-01-14 13:35 ` Peter Zijlstra @ 2008-01-14 18:59 ` Anton Salikhmetov 1 sibling, 0 replies; 20+ messages in thread From: Anton Salikhmetov @ 2008-01-14 18:59 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm, staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb 2008/1/14, Miklos Szeredi <miklos@szeredi.hu>: > > 2008/1/14, Miklos Szeredi <miklos@szeredi.hu>: > > > > http://bugzilla.kernel.org/show_bug.cgi?id=2645 > > > > > > > > Changes for updating the ctime and mtime fields for memory-mapped files: > > > > > > > > 1) new flag triggering update of the inode data; > > > > 2) new function to update ctime and mtime for block device files; > > > > 3) new helper function to update ctime and mtime when needed; > > > > 4) updating time stamps for mapped files in sys_msync() and do_fsync(); > > > > 5) implementing the feature of auto-updating ctime and mtime. > > > > > > How exactly is this done? > > > > > > Is this catering for this case: > > > > > > 1 page is dirtied through mapping > > > 2 app calls msync(MS_ASYNC) > > > 3 page is written again through mapping > > > 4 app calls msync(MS_ASYNC) > > > 5 ... > > > 6 page is written back > > > > > > What happens at 4? Do we care about this one at all? > > > > The POSIX standard requires updating the file times every time when msync() > > is called with MS_ASYNC. I.e. the time stamps should be updated even > > when no physical synchronization is being done immediately. > > Yes. However, on linux MS_ASYNC is basically a no-op, and without > doing _something_ with the dirty pages (which afaics your patch > doesn't do), it's impossible to observe later writes to the same page. > > I don't advocate full POSIX conformance anymore, because it's probably > too expensive to do (I've tried). Rather than that, we should > probably find some sane compromise, that just fixes the real life > issue. Here's a pointer to the thread about this: > > http://lkml.org/lkml/2007/3/27/55 > > Your patch may be a good soultion, but you should describe in detail > what it does when pages are dirtied, and when msync/fsync are called, > and what happens with multiple msync calls that I've asked about. > > I suspect your patch is ignoring writes after the first msync, but > then why care about msync at all? What's so special about the _first_ > msync? Is it just that most test programs only check this, and not > what happens if msync is called more than once? That would be a bug > in the test cases. > > > > > +/* > > > > + * Update the ctime and mtime stamps for memory-mapped block device files. > > > > + */ > > > > +static void bd_inode_update_time(struct inode *inode) > > > > +{ > > > > + struct block_device *bdev = inode->i_bdev; > > > > + struct list_head *p; > > > > + > > > > + if (bdev == NULL) > > > > + return; > > > > + > > > > + mutex_lock(&bdev->bd_mutex); > > > > + list_for_each(p, &bdev->bd_inodes) { > > > > + inode = list_entry(p, struct inode, i_devices); > > > > + inode_update_time(inode); > > > > + } > > > > + mutex_unlock(&bdev->bd_mutex); > > > > +} > > > > > > Umm, why not just call with file->f_dentry->d_inode, so that you don't > > > need to do this ugly search for the physical inode? The file pointer > > > is available in both msync and fsync. > > > > I'm not sure if I undestood your question. I see two possible > > interpretations for this question, and I'm answering both. > > > > The intention was to make the data changes in the block device data > > visible to all device files associated with the block device. Hence > > the search, because the time stamps for all such device files should > > be updated as well. > > Ahh, but it will only update "active" devices, which are currently > open, no? Is that what we want? > > > Not only the sys_msync() and do_fsync() routines call the helper > > function mapping_update_time(). > > Ah yes, __sync_single_inode() calls it as well. Why? The __sync_single_inode() function calls mapping_update_time() to enable the "auto-updating" feature discussed earlier. > > Miklos > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2008-01-15 10:46 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-01-13 4:39 [PATCH 0/2] yet another attempt to fix the ctime and mtime issue Anton Salikhmetov 2008-01-13 4:39 ` [PATCH 1/2] massive code cleanup of sys_msync() Anton Salikhmetov 2008-01-13 4:46 ` Rik van Riel 2008-01-14 10:49 ` Miklos Szeredi 2008-01-14 11:56 ` Anton Salikhmetov 2008-01-13 4:39 ` [PATCH 2/2] updating ctime and mtime at syncing Anton Salikhmetov 2008-01-13 4:59 ` Rik van Riel 2008-01-14 11:08 ` Miklos Szeredi 2008-01-14 11:15 ` Miklos Szeredi 2008-01-14 12:25 ` Anton Salikhmetov 2008-01-14 12:22 ` Anton Salikhmetov 2008-01-14 13:14 ` Miklos Szeredi 2008-01-14 13:35 ` Peter Zijlstra 2008-01-14 13:39 ` Peter Zijlstra 2008-01-14 13:45 ` Miklos Szeredi 2008-01-14 13:47 ` Miklos Szeredi 2008-01-14 14:17 ` Anton Salikhmetov 2008-01-15 9:53 ` Miklos Szeredi 2008-01-15 10:46 ` Anton Salikhmetov 2008-01-14 18:59 ` Anton Salikhmetov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox