linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Updating ctime and mtime for memory-mapped files [try #4]
@ 2008-01-15 16:02 Anton Salikhmetov
  2008-01-15 16:02 ` [PATCH 1/2] Massive code cleanup of sys_msync() Anton Salikhmetov
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Anton Salikhmetov @ 2008-01-15 16:02 UTC (permalink / raw)
  To: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb,
	miklos

1. Introduction

This is the fourth version of my solution for the bug #2645:

http://bugzilla.kernel.org/show_bug.cgi?id=2645

Changes since the previous version:

1) the case of retouching an already-dirty page pointed out
  by Miklos Szeredi has been addressed;

2) the file metadata are updated using the page modification time
  instead of the time of syncing data;

3) a few small corrections according to the latest feedback.

Brief explanation of these changes as well as some design considerations
are given below.

2. The case of retouching an already-dirtied page

Miklos Szeredi gave the following feedback on the previous version:

> I suspect your patch is ignoring writes after the first msync, but
> then why care about msync at all?  What's so special about the _first_
> msync?  Is it just that most test programs only check this, and not
> what happens if msync is called more than once?  That would be a bug
> in the test cases.

This version adds handling of the case of multiple msync() calls. Before
going on with the explanaion, I'll quote a remark by Peter Zijlstra:

> I must agree, doing the mmap dirty, MS_ASYNC, mmap retouch, MS_ASYNC
> case correctly would need a lot more code which I doubt is worth the
> effort.
>
> It would require scanning the PTEs and marking them read-only again on
> MS_ASYNC, and some more logic in set_page_dirty() because that currently
> bails out early if the page in question is already dirty.

Indeed, the following logic of the __set_pages_dirty_nobuffers() function:

if (!TestSetPageDirty(page)) {
       mapping = page_mapping(page);

       if (!mapping)
               return 1;

       /* critical section */

       if (mapping->host) {
               __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
               set_bit(AS_MCTIME, &mapping->flags);
       }
       return 1;
}
return 0;

made it difficult to account for the case of the already-dirty page
retouch after the call to msync(MS_ASYNC).

In this version of my solution, I redesigned the logic of the same
function as follows:

mapping = page_mapping(page);

if (!mapping)
       return 1;

set_bit(AS_MCTIME, &mapping->flags);

if (TestSetPageDirty(page))
       return 0;

/* critical section */

if (mapping->host) {
       __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);

return 1;

This allows us to set the AS_MCTIME bit independently of whether the page
had already been dirtied or not. Besides, such change makes the logic of
the topmost "if" in this function straight thus improving readability.
Finally, we already have the __set_page_dirty() routine with almost
identical functionality. My redesign of __set_pages_dirty_nobuffers()
is based on how the __set_page_dirty() routine is implemented.

Miklos gave an example of a scenario, where the previous version of
my solution would fail:

http://lkml.org/lkml/2008/1/14/100

Here is how it looks in the version I am sending now:

 1 page is dirtied through mapping
       => the AS_MCTIME bit turned on
 2 app calls msync(MS_ASYNC)
       => inode's times updated, the AS_MCTIME bit turned off
 3 page is written again through mapping
       => the AS_MCTIME bit turned on again
 4 app calls msync(MS_ASYNC)
       => inode's times updated, the AS_MCTIME bit turned off
 5 ...
 6 page is written back
       => ... by this moment, the either the msync(MS_ASYNC) has
          taken care of updating the file times, or the AS_MCTIME
          bit is on.

I think that the feedback about writes after the first msync(MS_ASYNC)
has thereby been addressed.

3. Updating the time stamps of the block device special files

As for the block device case, let's start from the following assumption:

if the block device data changes, we should do our best to tell the world
that this has happened.

This is how I approach this requirement:

1) if the block device is active, this is done at next *sync() through
  calling the bd_inode_update_time() helper function.

2) if the block device is not active, this is done during the block
  device file deactivation in the unlink_file_vma() routine.

Removing either of these actions would leave a possibility of losing
information about the block device data update. That is why I am keeping
both.

4. Recording the time was the file data changed

Finally, I noticed yet another issue with the previous version of my patch.
Specifically, the time stamps were set to the current time of the moment
when syncing but not the write reference was being done. This led to the
following adverse effect on my development system:

1) a text file A was updated by process B;
2) process B exits without calling any of the *sync() functions;
3) vi editor opens the file A;
4) file data synced, file times updated;
5) vi is confused by "thinking" that the file was changed after 3).

This version overcomes this problem by introducing another field into the
address_space structure. This field is used to "remember" the time of
dirtying, and then this time value is propagated to the file metadata.

This approach is based upon the following suggestion given by Peter
Staubach during one of our previous discussions:

http://lkml.org/lkml/2008/1/9/267

> A better architecture would be to arrange for the file times
> to be updated when the page makes the transition from being
> unmodified to modified.  This is not straightforward due to
> the current locking, but should be doable, I think.  Perhaps
> recording the current time and then using it to update the
> file times at a more suitable time (no pun intended) might
> work.

The solution I propose now proves the viability of the latter
approach.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread
* [PATCH 0/2] yet another attempt to fix the ctime and mtime issue
@ 2008-01-13  4:39 Anton Salikhmetov
  2008-01-13  4:39 ` [PATCH 1/2] massive code cleanup of sys_msync() Anton Salikhmetov
  0 siblings, 1 reply; 24+ messages in thread
From: Anton Salikhmetov @ 2008-01-13  4:39 UTC (permalink / raw)
  To: linux-mm, jakob, linux-kernel, valdis.kletnieks, riel, ksm,
	staubach, jesper.juhl, torvalds, a.p.zijlstra, akpm, protasnb

The POSIX standard requires that the ctime and mtime fields
for memory-mapped files should be updated after a write
reference to the memory region where the file data is mapped.
At least FreeBSD 6.2 and HP-UX 11i implement this properly.
Linux does not, which leads to data loss problems in database
backup applications.

Kernel Bug Tracker contains more information about the problem:

http://bugzilla.kernel.org/show_bug.cgi?id=2645

There have been several attempts in the past to address this
issue. Following are a few links to LKML discussions related
to this bug:

http://lkml.org/lkml/2006/5/17/138
http://lkml.org/lkml/2007/2/21/242
http://lkml.org/lkml/2008/1/7/234

All earlier solutions were criticized. Some solutions did not
handle memory-mapped block devices properly. Some led to forcing
applications to explicitly call msync() to update file metadata.
Some contained errors in using kernel synchronization primitives.

In the two patches that follow, I would like to propose a new
solution.

This is the third version of my changes. This version takes
into account all feedback I received for the two previous versions.
The overall design remains basically the same as the one that
was acked by Rick van Riel:

http://lkml.org/lkml/2008/1/11/208

To the best of my knowledge, these patches are free of all the
drawbacks found during previous attempts by Peter Staubach,
Miklos Szeredi and myself.

New since the previous version:

1) no need to explicitly call msync() to update file times;
2) changing block device data is visible to all device files
   associated with the block device;
3) in the cleanup part, the error checks are separated out as
   suggested by Rik van Riel;
4) some small refinements accodring to the LKML comments.

This is how I tested the patches.

1. To test the features mentioned above, I wrote a unit test
   available from

   http://bugzilla.kernel.org/attachment.cgi?id=14430

   I verified that the unit test passed successfully for both
   regular files and block device files. For the unit test I
   used the following architectures: 32-bit x86, x86_64 and
   MIPS32 (cross-compiled from x86_64).

2. I did build tests with allmodconfig and allyesconfig on x86_64.

3. I ran the following test cases from the LTP test suite:

   msync01
   msync02
   msync03
   msync04
   msync05
   mmapstress01
   mmapstress09
   mmapstress10

   No regressions were found by these test cases.

I think that the bug #2645 is resolved by these patches.

Please apply.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2008-01-15 22:16 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-15 16:02 [PATCH 0/2] Updating ctime and mtime for memory-mapped files [try #4] Anton Salikhmetov
2008-01-15 16:02 ` [PATCH 1/2] Massive code cleanup of sys_msync() Anton Salikhmetov
2008-01-15 17:57   ` Christoph Hellwig
2008-01-15 19:02     ` Anton Salikhmetov
2008-01-15 19:10       ` Randy Dunlap
2008-01-15 19:26         ` Anton Salikhmetov
2008-01-15 19:28           ` Peter Zijlstra
2008-01-15 19:32             ` Christoph Hellwig
2008-01-15 20:12               ` Anton Salikhmetov
2008-01-15 20:46         ` Matt Mackall
2008-01-15 21:06           ` Randy Dunlap
2008-01-15 16:02 ` [PATCH 2/2] Updating ctime and mtime at syncing Anton Salikhmetov
     [not found]   ` <1200414911.26045.32.camel@twins>
2008-01-15 17:18     ` Anton Salikhmetov
2008-01-15 19:30       ` Peter Zijlstra
2008-01-15 18:04   ` Christoph Hellwig
2008-01-15 19:04     ` Anton Salikhmetov
2008-01-15 20:27 ` [PATCH 0/2] Updating ctime and mtime for memory-mapped files [try #4] Miklos Szeredi
2008-01-15 20:32   ` Peter Zijlstra
2008-01-15 20:40     ` Miklos Szeredi
2008-01-15 22:15   ` Anton Salikhmetov
  -- strict thread matches above, loose matches on Subject: below --
2008-01-13  4:39 [PATCH 0/2] yet another attempt to fix the ctime and mtime issue Anton Salikhmetov
2008-01-13  4:39 ` [PATCH 1/2] massive code cleanup of sys_msync() Anton Salikhmetov
2008-01-13  4:46   ` Rik van Riel
2008-01-14 10:49   ` Miklos Szeredi
2008-01-14 11:56     ` Anton Salikhmetov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox