* [patch 0/3 resend] mm: close race between dirtying and truncation
@ 2014-12-16 16:18 Johannes Weiner
2014-12-16 16:18 ` [patch 1/3] mm: protect set_page_dirty() from ongoing truncation Johannes Weiner
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Johannes Weiner @ 2014-12-16 16:18 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Hugh Dickins, Michel Lespinasse, Jan Kara,
Kirill A. Shutemov, linux-mm, linux-fsdevel, linux-kernel
Hi Andrew,
it looks like these 3 patches fell on the floor. The first one is a
bug fix that closes a race condition between truncation and dirtying
from do_wp_page(), which can result in a dirty-accounting error. #2
and #3 simplify the code in the area. Please consider them for 3.19.
Thanks!
include/linux/writeback.h | 1 -
mm/memory.c | 54 ++++++++++++++++++---------------------------
mm/page-writeback.c | 43 ++++++++++--------------------------
3 files changed, 34 insertions(+), 64 deletions(-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* [patch 1/3] mm: protect set_page_dirty() from ongoing truncation
2014-12-16 16:18 [patch 0/3 resend] mm: close race between dirtying and truncation Johannes Weiner
@ 2014-12-16 16:18 ` Johannes Weiner
2014-12-16 16:18 ` [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas Johannes Weiner
2014-12-16 16:18 ` [patch 3/3] mm: memory: merge shared-writable dirtying branches in do_wp_page() Johannes Weiner
2 siblings, 0 replies; 9+ messages in thread
From: Johannes Weiner @ 2014-12-16 16:18 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Hugh Dickins, Michel Lespinasse, Jan Kara,
Kirill A. Shutemov, linux-mm, linux-fsdevel, linux-kernel
Tejun, while reviewing the code, spotted the following race condition
between the dirtying and truncation of a page:
__set_page_dirty_nobuffers() __delete_from_page_cache()
if (TestSetPageDirty(page))
page->mapping = NULL
if (PageDirty())
dec_zone_page_state(page, NR_FILE_DIRTY);
dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
if (page->mapping)
account_page_dirtied(page)
__inc_zone_page_state(page, NR_FILE_DIRTY);
__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.
Dirtiers usually lock out truncation, either by holding the page lock
directly, or in case of zap_pte_range(), by pinning the mapcount with
the page table lock held. The notable exception to this rule, though,
is do_wp_page(), for which this race exists. However, do_wp_page()
already waits for a locked page to unlock before setting the dirty
bit, in order to prevent a race where clear_page_dirty() misses the
page bit in the presence of dirty ptes. Upgrade that wait to a fully
locked set_page_dirty() to also cover the situation explained above.
Afterwards, the code in set_page_dirty() dealing with a truncation
race is no longer needed. Remove it.
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
include/linux/writeback.h | 1 -
mm/memory.c | 27 +++++++++++++++++----------
mm/page-writeback.c | 43 ++++++++++++-------------------------------
3 files changed, 29 insertions(+), 42 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index a219be961c0a..00048339c23e 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -177,7 +177,6 @@ int write_cache_pages(struct address_space *mapping,
struct writeback_control *wbc, writepage_t writepage,
void *data);
int do_writepages(struct address_space *mapping, struct writeback_control *wbc);
-void set_page_dirty_balance(struct page *page);
void writeback_set_ratelimit(void);
void tag_pages_for_writeback(struct address_space *mapping,
pgoff_t start, pgoff_t end);
diff --git a/mm/memory.c b/mm/memory.c
index c3b9097251c5..4c06cdcd36cf 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2137,17 +2137,24 @@ reuse:
if (!dirty_page)
return ret;
- /*
- * Yes, Virginia, this is actually required to prevent a race
- * with clear_page_dirty_for_io() from clearing the page dirty
- * bit after it clear all dirty ptes, but before a racing
- * do_wp_page installs a dirty pte.
- *
- * do_shared_fault is protected similarly.
- */
if (!page_mkwrite) {
- wait_on_page_locked(dirty_page);
- set_page_dirty_balance(dirty_page);
+ struct address_space *mapping;
+ int dirtied;
+
+ lock_page(dirty_page);
+ dirtied = set_page_dirty(dirty_page);
+ VM_BUG_ON_PAGE(PageAnon(dirty_page), dirty_page);
+ mapping = dirty_page->mapping;
+ unlock_page(dirty_page);
+
+ if (dirtied && mapping) {
+ /*
+ * Some device drivers do not set page.mapping
+ * but still dirty their pages
+ */
+ balance_dirty_pages_ratelimited(mapping);
+ }
+
/* file_update_time outside page_lock */
if (vma->vm_file)
file_update_time(vma->vm_file);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d5d81f5384d1..6f4335238e33 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1541,16 +1541,6 @@ pause:
bdi_start_background_writeback(bdi);
}
-void set_page_dirty_balance(struct page *page)
-{
- if (set_page_dirty(page)) {
- struct address_space *mapping = page_mapping(page);
-
- if (mapping)
- balance_dirty_pages_ratelimited(mapping);
- }
-}
-
static DEFINE_PER_CPU(int, bdp_ratelimits);
/*
@@ -2123,32 +2113,25 @@ EXPORT_SYMBOL(account_page_dirtied);
* page dirty in that case, but not all the buffers. This is a "bottom-up"
* dirtying, whereas __set_page_dirty_buffers() is a "top-down" dirtying.
*
- * Most callers have locked the page, which pins the address_space in memory.
- * But zap_pte_range() does not lock the page, however in that case the
- * mapping is pinned by the vma's ->vm_file reference.
- *
- * We take care to handle the case where the page was truncated from the
- * mapping by re-checking page_mapping() inside tree_lock.
+ * The caller must ensure this doesn't race with truncation. Most will simply
+ * hold the page lock, but e.g. zap_pte_range() calls with the page mapped and
+ * the pte lock held, which also locks out truncation.
*/
int __set_page_dirty_nobuffers(struct page *page)
{
if (!TestSetPageDirty(page)) {
struct address_space *mapping = page_mapping(page);
- struct address_space *mapping2;
unsigned long flags;
if (!mapping)
return 1;
spin_lock_irqsave(&mapping->tree_lock, flags);
- mapping2 = page_mapping(page);
- if (mapping2) { /* Race with truncate? */
- BUG_ON(mapping2 != mapping);
- WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
- account_page_dirtied(page, mapping);
- radix_tree_tag_set(&mapping->page_tree,
- page_index(page), PAGECACHE_TAG_DIRTY);
- }
+ BUG_ON(page_mapping(page) != mapping);
+ WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
+ account_page_dirtied(page, mapping);
+ radix_tree_tag_set(&mapping->page_tree, page_index(page),
+ PAGECACHE_TAG_DIRTY);
spin_unlock_irqrestore(&mapping->tree_lock, flags);
if (mapping->host) {
/* !PageAnon && !swapper_space */
@@ -2305,12 +2288,10 @@ int clear_page_dirty_for_io(struct page *page)
/*
* We carefully synchronise fault handlers against
* installing a dirty pte and marking the page dirty
- * at this point. We do this by having them hold the
- * page lock at some point after installing their
- * pte, but before marking the page dirty.
- * Pages are always locked coming in here, so we get
- * the desired exclusion. See mm/memory.c:do_wp_page()
- * for more comments.
+ * at this point. We do this by having them hold the
+ * page lock while dirtying the page, and pages are
+ * always locked coming in here, so we get the desired
+ * exclusion.
*/
if (TestClearPageDirty(page)) {
dec_zone_page_state(page, NR_FILE_DIRTY);
--
2.1.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas
2014-12-16 16:18 [patch 0/3 resend] mm: close race between dirtying and truncation Johannes Weiner
2014-12-16 16:18 ` [patch 1/3] mm: protect set_page_dirty() from ongoing truncation Johannes Weiner
@ 2014-12-16 16:18 ` Johannes Weiner
2014-12-16 16:18 ` [patch 3/3] mm: memory: merge shared-writable dirtying branches in do_wp_page() Johannes Weiner
2 siblings, 0 replies; 9+ messages in thread
From: Johannes Weiner @ 2014-12-16 16:18 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Hugh Dickins, Michel Lespinasse, Jan Kara,
Kirill A. Shutemov, linux-mm, linux-fsdevel, linux-kernel
Shared anonymous mmaps are implemented with shmem files, so all VMAs
with shared writable semantics also have an underlying backing file.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
mm/memory.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 4c06cdcd36cf..d5a303cf2901 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2155,9 +2155,7 @@ reuse:
balance_dirty_pages_ratelimited(mapping);
}
- /* file_update_time outside page_lock */
- if (vma->vm_file)
- file_update_time(vma->vm_file);
+ file_update_time(vma->vm_file);
}
put_page(dirty_page);
if (page_mkwrite) {
@@ -3013,8 +3011,7 @@ static int do_shared_fault(struct mm_struct *mm, struct vm_area_struct *vma,
balance_dirty_pages_ratelimited(mapping);
}
- /* file_update_time outside page_lock */
- if (vma->vm_file && !vma->vm_ops->page_mkwrite)
+ if (!vma->vm_ops->page_mkwrite)
file_update_time(vma->vm_file);
return ret;
--
2.1.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* [patch 3/3] mm: memory: merge shared-writable dirtying branches in do_wp_page()
2014-12-16 16:18 [patch 0/3 resend] mm: close race between dirtying and truncation Johannes Weiner
2014-12-16 16:18 ` [patch 1/3] mm: protect set_page_dirty() from ongoing truncation Johannes Weiner
2014-12-16 16:18 ` [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas Johannes Weiner
@ 2014-12-16 16:18 ` Johannes Weiner
2 siblings, 0 replies; 9+ messages in thread
From: Johannes Weiner @ 2014-12-16 16:18 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Hugh Dickins, Michel Lespinasse, Jan Kara,
Kirill A. Shutemov, linux-mm, linux-fsdevel, linux-kernel
Whether there is a vm_ops->page_mkwrite or not, the page dirtying is
pretty much the same. Make sure the page references are the same in
both cases, then merge the two branches.
It's tempting to go even further and page-lock the !page_mkwrite case,
to get it in line with everybody else setting the page table and thus
further simplify the model. But that's not quite compelling enough to
justify dropping the pte lock, then relocking and verifying the entry
for filesystems without ->page_mkwrite, which notably includes tmpfs.
Leave it for now and lock the page late in the !page_mkwrite case.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
mm/memory.c | 48 +++++++++++++++++-------------------------------
1 file changed, 17 insertions(+), 31 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index d5a303cf2901..a26f30b6abd1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2033,7 +2033,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
pte_t entry;
int ret = 0;
int page_mkwrite = 0;
- struct page *dirty_page = NULL;
+ bool dirty_shared = false;
unsigned long mmun_start = 0; /* For mmu_notifiers */
unsigned long mmun_end = 0; /* For mmu_notifiers */
struct mem_cgroup *memcg;
@@ -2084,6 +2084,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
unlock_page(old_page);
} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
(VM_WRITE|VM_SHARED))) {
+ page_cache_get(old_page);
/*
* Only catch write-faults on shared writable pages,
* read-only shared pages can get COWed by
@@ -2091,7 +2092,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
*/
if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
int tmp;
- page_cache_get(old_page);
+
pte_unmap_unlock(page_table, ptl);
tmp = do_page_mkwrite(vma, old_page, address);
if (unlikely(!tmp || (tmp &
@@ -2111,11 +2112,10 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
unlock_page(old_page);
goto unlock;
}
-
page_mkwrite = 1;
}
- dirty_page = old_page;
- get_page(dirty_page);
+
+ dirty_shared = true;
reuse:
/*
@@ -2134,43 +2134,29 @@ reuse:
pte_unmap_unlock(page_table, ptl);
ret |= VM_FAULT_WRITE;
- if (!dirty_page)
- return ret;
-
- if (!page_mkwrite) {
+ if (dirty_shared) {
struct address_space *mapping;
int dirtied;
- lock_page(dirty_page);
- dirtied = set_page_dirty(dirty_page);
- VM_BUG_ON_PAGE(PageAnon(dirty_page), dirty_page);
- mapping = dirty_page->mapping;
- unlock_page(dirty_page);
+ if (!page_mkwrite)
+ lock_page(old_page);
- if (dirtied && mapping) {
- /*
- * Some device drivers do not set page.mapping
- * but still dirty their pages
- */
- balance_dirty_pages_ratelimited(mapping);
- }
+ dirtied = set_page_dirty(old_page);
+ VM_BUG_ON_PAGE(PageAnon(old_page), old_page);
+ mapping = old_page->mapping;
+ unlock_page(old_page);
+ page_cache_release(old_page);
- file_update_time(vma->vm_file);
- }
- put_page(dirty_page);
- if (page_mkwrite) {
- struct address_space *mapping = dirty_page->mapping;
-
- set_page_dirty(dirty_page);
- unlock_page(dirty_page);
- page_cache_release(dirty_page);
- if (mapping) {
+ if ((dirtied || page_mkwrite) && mapping) {
/*
* Some device drivers do not set page.mapping
* but still dirty their pages
*/
balance_dirty_pages_ratelimited(mapping);
}
+
+ if (!page_mkwrite)
+ file_update_time(vma->vm_file);
}
return ret;
--
2.1.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [patch 1/3] mm: protect set_page_dirty() from ongoing truncation
@ 2014-12-05 14:52 Johannes Weiner
2014-12-05 14:52 ` [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas Johannes Weiner
0 siblings, 1 reply; 9+ messages in thread
From: Johannes Weiner @ 2014-12-05 14:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Hugh Dickins, Michel Lespinasse, Jan Kara,
Kirill A. Shutemov, linux-mm, linux-fsdevel, linux-kernel
Tejun, while reviewing the code, spotted the following race condition
between the dirtying and truncation of a page:
__set_page_dirty_nobuffers() __delete_from_page_cache()
if (TestSetPageDirty(page))
page->mapping = NULL
if (PageDirty())
dec_zone_page_state(page, NR_FILE_DIRTY);
dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
if (page->mapping)
account_page_dirtied(page)
__inc_zone_page_state(page, NR_FILE_DIRTY);
__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.
Dirtiers usually lock out truncation, either by holding the page lock
directly, or in case of zap_pte_range(), by pinning the mapcount with
the page table lock held. The notable exception to this rule, though,
is do_wp_page(), for which this race exists. However, do_wp_page()
already waits for a locked page to unlock before setting the dirty
bit, in order to prevent a race where clear_page_dirty() misses the
page bit in the presence of dirty ptes. Upgrade that wait to a fully
locked set_page_dirty() to also cover the situation explained above.
Afterwards, the code in set_page_dirty() dealing with a truncation
race is no longer needed. Remove it.
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/writeback.h | 1 -
mm/memory.c | 27 +++++++++++++++++----------
mm/page-writeback.c | 43 ++++++++++++-------------------------------
3 files changed, 29 insertions(+), 42 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index a219be961c0a..00048339c23e 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -177,7 +177,6 @@ int write_cache_pages(struct address_space *mapping,
struct writeback_control *wbc, writepage_t writepage,
void *data);
int do_writepages(struct address_space *mapping, struct writeback_control *wbc);
-void set_page_dirty_balance(struct page *page);
void writeback_set_ratelimit(void);
void tag_pages_for_writeback(struct address_space *mapping,
pgoff_t start, pgoff_t end);
diff --git a/mm/memory.c b/mm/memory.c
index 3e503831e042..72d998eb0438 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2150,17 +2150,24 @@ reuse:
if (!dirty_page)
return ret;
- /*
- * Yes, Virginia, this is actually required to prevent a race
- * with clear_page_dirty_for_io() from clearing the page dirty
- * bit after it clear all dirty ptes, but before a racing
- * do_wp_page installs a dirty pte.
- *
- * do_shared_fault is protected similarly.
- */
if (!page_mkwrite) {
- wait_on_page_locked(dirty_page);
- set_page_dirty_balance(dirty_page);
+ struct address_space *mapping;
+ int dirtied;
+
+ lock_page(dirty_page);
+ dirtied = set_page_dirty(dirty_page);
+ VM_BUG_ON_PAGE(PageAnon(dirty_page), dirty_page);
+ mapping = dirty_page->mapping;
+ unlock_page(dirty_page);
+
+ if (dirtied && mapping) {
+ /*
+ * Some device drivers do not set page.mapping
+ * but still dirty their pages
+ */
+ balance_dirty_pages_ratelimited(mapping);
+ }
+
/* file_update_time outside page_lock */
if (vma->vm_file)
file_update_time(vma->vm_file);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 19ceae87522d..437174a2aaa3 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1541,16 +1541,6 @@ pause:
bdi_start_background_writeback(bdi);
}
-void set_page_dirty_balance(struct page *page)
-{
- if (set_page_dirty(page)) {
- struct address_space *mapping = page_mapping(page);
-
- if (mapping)
- balance_dirty_pages_ratelimited(mapping);
- }
-}
-
static DEFINE_PER_CPU(int, bdp_ratelimits);
/*
@@ -2123,32 +2113,25 @@ EXPORT_SYMBOL(account_page_dirtied);
* page dirty in that case, but not all the buffers. This is a "bottom-up"
* dirtying, whereas __set_page_dirty_buffers() is a "top-down" dirtying.
*
- * Most callers have locked the page, which pins the address_space in memory.
- * But zap_pte_range() does not lock the page, however in that case the
- * mapping is pinned by the vma's ->vm_file reference.
- *
- * We take care to handle the case where the page was truncated from the
- * mapping by re-checking page_mapping() inside tree_lock.
+ * The caller must ensure this doesn't race with truncation. Most will simply
+ * hold the page lock, but e.g. zap_pte_range() calls with the page mapped and
+ * the pte lock held, which also locks out truncation.
*/
int __set_page_dirty_nobuffers(struct page *page)
{
if (!TestSetPageDirty(page)) {
struct address_space *mapping = page_mapping(page);
- struct address_space *mapping2;
unsigned long flags;
if (!mapping)
return 1;
spin_lock_irqsave(&mapping->tree_lock, flags);
- mapping2 = page_mapping(page);
- if (mapping2) { /* Race with truncate? */
- BUG_ON(mapping2 != mapping);
- WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
- account_page_dirtied(page, mapping);
- radix_tree_tag_set(&mapping->page_tree,
- page_index(page), PAGECACHE_TAG_DIRTY);
- }
+ BUG_ON(page_mapping(page) != mapping);
+ WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
+ account_page_dirtied(page, mapping);
+ radix_tree_tag_set(&mapping->page_tree, page_index(page),
+ PAGECACHE_TAG_DIRTY);
spin_unlock_irqrestore(&mapping->tree_lock, flags);
if (mapping->host) {
/* !PageAnon && !swapper_space */
@@ -2305,12 +2288,10 @@ int clear_page_dirty_for_io(struct page *page)
/*
* We carefully synchronise fault handlers against
* installing a dirty pte and marking the page dirty
- * at this point. We do this by having them hold the
- * page lock at some point after installing their
- * pte, but before marking the page dirty.
- * Pages are always locked coming in here, so we get
- * the desired exclusion. See mm/memory.c:do_wp_page()
- * for more comments.
+ * at this point. We do this by having them hold the
+ * page lock while dirtying the page, and pages are
+ * always locked coming in here, so we get the desired
+ * exclusion.
*/
if (TestClearPageDirty(page)) {
dec_zone_page_state(page, NR_FILE_DIRTY);
--
2.1.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas
2014-12-05 14:52 [patch 1/3] mm: protect set_page_dirty() from ongoing truncation Johannes Weiner
@ 2014-12-05 14:52 ` Johannes Weiner
0 siblings, 0 replies; 9+ messages in thread
From: Johannes Weiner @ 2014-12-05 14:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Hugh Dickins, Michel Lespinasse, Jan Kara,
Kirill A. Shutemov, linux-mm, linux-fsdevel, linux-kernel
Shared anonymous mmaps are implemented with shmem files, so all VMAs
with shared writable semantics also have an underlying backing file.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
mm/memory.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 72d998eb0438..5640a718ac58 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2168,9 +2168,7 @@ reuse:
balance_dirty_pages_ratelimited(mapping);
}
- /* file_update_time outside page_lock */
- if (vma->vm_file)
- file_update_time(vma->vm_file);
+ file_update_time(vma->vm_file);
}
put_page(dirty_page);
if (page_mkwrite) {
@@ -3026,8 +3024,7 @@ static int do_shared_fault(struct mm_struct *mm, struct vm_area_struct *vma,
balance_dirty_pages_ratelimited(mapping);
}
- /* file_update_time outside page_lock */
- if (vma->vm_file && !vma->vm_ops->page_mkwrite)
+ if (!vma->vm_ops->page_mkwrite)
file_update_time(vma->vm_file);
return ret;
--
2.1.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [patch 1/3] mm: protect set_page_dirty() from ongoing truncation
@ 2014-12-01 22:58 Johannes Weiner
2014-12-01 22:58 ` [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas Johannes Weiner
0 siblings, 1 reply; 9+ messages in thread
From: Johannes Weiner @ 2014-12-01 22:58 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Hugh Dickins, Michel Lespinasse, Jan Kara, linux-mm,
linux-fsdevel, linux-kernel
Tejun, while reviewing the code, spotted the following race condition
between the dirtying and truncation of a page:
__set_page_dirty_nobuffers() __delete_from_page_cache()
if (TestSetPageDirty(page))
page->mapping = NULL
if (PageDirty())
dec_zone_page_state(page, NR_FILE_DIRTY);
dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
if (page->mapping)
account_page_dirtied(page)
__inc_zone_page_state(page, NR_FILE_DIRTY);
__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.
Dirtiers usually lock out truncation, either by holding the page lock
directly, or in case of zap_pte_range(), by pinning the mapcount with
the page table lock held. The notable exception to this rule, though,
is do_wp_page(), for which this race exists. However, do_wp_page()
already waits for a locked page to unlock before setting the dirty
bit, in order to prevent a race where clear_page_dirty() misses the
page bit in the presence of dirty ptes. Upgrade that wait to a fully
locked set_page_dirty() to also cover the situation explained above.
Afterwards, the code in set_page_dirty() dealing with a truncation
race is no longer needed. Remove it.
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
---
include/linux/writeback.h | 1 -
mm/memory.c | 26 ++++++++++++++++----------
mm/page-writeback.c | 43 ++++++++++++-------------------------------
3 files changed, 28 insertions(+), 42 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index a219be961c0a..00048339c23e 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -177,7 +177,6 @@ int write_cache_pages(struct address_space *mapping,
struct writeback_control *wbc, writepage_t writepage,
void *data);
int do_writepages(struct address_space *mapping, struct writeback_control *wbc);
-void set_page_dirty_balance(struct page *page);
void writeback_set_ratelimit(void);
void tag_pages_for_writeback(struct address_space *mapping,
pgoff_t start, pgoff_t end);
diff --git a/mm/memory.c b/mm/memory.c
index 3e503831e042..73220eb6e9e3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2150,17 +2150,23 @@ reuse:
if (!dirty_page)
return ret;
- /*
- * Yes, Virginia, this is actually required to prevent a race
- * with clear_page_dirty_for_io() from clearing the page dirty
- * bit after it clear all dirty ptes, but before a racing
- * do_wp_page installs a dirty pte.
- *
- * do_shared_fault is protected similarly.
- */
if (!page_mkwrite) {
- wait_on_page_locked(dirty_page);
- set_page_dirty_balance(dirty_page);
+ struct address_space *mapping;
+ int dirtied;
+
+ lock_page(dirty_page);
+ dirtied = set_page_dirty(dirty_page);
+ mapping = dirty_page->mapping;
+ unlock_page(dirty_page);
+
+ if (dirtied && mapping) {
+ /*
+ * Some device drivers do not set page.mapping
+ * but still dirty their pages
+ */
+ balance_dirty_pages_ratelimited(mapping);
+ }
+
/* file_update_time outside page_lock */
if (vma->vm_file)
file_update_time(vma->vm_file);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 19ceae87522d..437174a2aaa3 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1541,16 +1541,6 @@ pause:
bdi_start_background_writeback(bdi);
}
-void set_page_dirty_balance(struct page *page)
-{
- if (set_page_dirty(page)) {
- struct address_space *mapping = page_mapping(page);
-
- if (mapping)
- balance_dirty_pages_ratelimited(mapping);
- }
-}
-
static DEFINE_PER_CPU(int, bdp_ratelimits);
/*
@@ -2123,32 +2113,25 @@ EXPORT_SYMBOL(account_page_dirtied);
* page dirty in that case, but not all the buffers. This is a "bottom-up"
* dirtying, whereas __set_page_dirty_buffers() is a "top-down" dirtying.
*
- * Most callers have locked the page, which pins the address_space in memory.
- * But zap_pte_range() does not lock the page, however in that case the
- * mapping is pinned by the vma's ->vm_file reference.
- *
- * We take care to handle the case where the page was truncated from the
- * mapping by re-checking page_mapping() inside tree_lock.
+ * The caller must ensure this doesn't race with truncation. Most will simply
+ * hold the page lock, but e.g. zap_pte_range() calls with the page mapped and
+ * the pte lock held, which also locks out truncation.
*/
int __set_page_dirty_nobuffers(struct page *page)
{
if (!TestSetPageDirty(page)) {
struct address_space *mapping = page_mapping(page);
- struct address_space *mapping2;
unsigned long flags;
if (!mapping)
return 1;
spin_lock_irqsave(&mapping->tree_lock, flags);
- mapping2 = page_mapping(page);
- if (mapping2) { /* Race with truncate? */
- BUG_ON(mapping2 != mapping);
- WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
- account_page_dirtied(page, mapping);
- radix_tree_tag_set(&mapping->page_tree,
- page_index(page), PAGECACHE_TAG_DIRTY);
- }
+ BUG_ON(page_mapping(page) != mapping);
+ WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
+ account_page_dirtied(page, mapping);
+ radix_tree_tag_set(&mapping->page_tree, page_index(page),
+ PAGECACHE_TAG_DIRTY);
spin_unlock_irqrestore(&mapping->tree_lock, flags);
if (mapping->host) {
/* !PageAnon && !swapper_space */
@@ -2305,12 +2288,10 @@ int clear_page_dirty_for_io(struct page *page)
/*
* We carefully synchronise fault handlers against
* installing a dirty pte and marking the page dirty
- * at this point. We do this by having them hold the
- * page lock at some point after installing their
- * pte, but before marking the page dirty.
- * Pages are always locked coming in here, so we get
- * the desired exclusion. See mm/memory.c:do_wp_page()
- * for more comments.
+ * at this point. We do this by having them hold the
+ * page lock while dirtying the page, and pages are
+ * always locked coming in here, so we get the desired
+ * exclusion.
*/
if (TestClearPageDirty(page)) {
dec_zone_page_state(page, NR_FILE_DIRTY);
--
2.1.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas
2014-12-01 22:58 [patch 1/3] mm: protect set_page_dirty() from ongoing truncation Johannes Weiner
@ 2014-12-01 22:58 ` Johannes Weiner
2014-12-02 8:58 ` Jan Kara
2014-12-02 12:00 ` Kirill A. Shutemov
0 siblings, 2 replies; 9+ messages in thread
From: Johannes Weiner @ 2014-12-01 22:58 UTC (permalink / raw)
To: Andrew Morton
Cc: Tejun Heo, Hugh Dickins, Michel Lespinasse, Jan Kara, linux-mm,
linux-fsdevel, linux-kernel
The only way a VMA can have shared and writable semantics is with a
backing file.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memory.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 73220eb6e9e3..2a2e3648ed65 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2167,9 +2167,7 @@ reuse:
balance_dirty_pages_ratelimited(mapping);
}
- /* file_update_time outside page_lock */
- if (vma->vm_file)
- file_update_time(vma->vm_file);
+ file_update_time(vma->vm_file);
}
put_page(dirty_page);
if (page_mkwrite) {
@@ -3025,8 +3023,7 @@ static int do_shared_fault(struct mm_struct *mm, struct vm_area_struct *vma,
balance_dirty_pages_ratelimited(mapping);
}
- /* file_update_time outside page_lock */
- if (vma->vm_file && !vma->vm_ops->page_mkwrite)
+ if (!vma->vm_ops->page_mkwrite)
file_update_time(vma->vm_file);
return ret;
--
2.1.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas
2014-12-01 22:58 ` [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas Johannes Weiner
@ 2014-12-02 8:58 ` Jan Kara
2014-12-02 15:18 ` Johannes Weiner
2014-12-02 12:00 ` Kirill A. Shutemov
1 sibling, 1 reply; 9+ messages in thread
From: Jan Kara @ 2014-12-02 8:58 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Tejun Heo, Hugh Dickins, Michel Lespinasse,
Jan Kara, linux-mm, linux-fsdevel, linux-kernel
On Mon 01-12-14 17:58:01, Johannes Weiner wrote:
> The only way a VMA can have shared and writable semantics is with a
> backing file.
OK, one always learns :) After some digging I found that MAP_SHARED |
MAP_ANONYMOUS mappings are in fact mappings of a temporary file in tmpfs.
It would be worth to mention this in the changelog I believe. Otherwise
feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> mm/memory.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 73220eb6e9e3..2a2e3648ed65 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2167,9 +2167,7 @@ reuse:
> balance_dirty_pages_ratelimited(mapping);
> }
>
> - /* file_update_time outside page_lock */
> - if (vma->vm_file)
> - file_update_time(vma->vm_file);
> + file_update_time(vma->vm_file);
> }
> put_page(dirty_page);
> if (page_mkwrite) {
> @@ -3025,8 +3023,7 @@ static int do_shared_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> balance_dirty_pages_ratelimited(mapping);
> }
>
> - /* file_update_time outside page_lock */
> - if (vma->vm_file && !vma->vm_ops->page_mkwrite)
> + if (!vma->vm_ops->page_mkwrite)
> file_update_time(vma->vm_file);
>
> return ret;
> --
> 2.1.3
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas
2014-12-02 8:58 ` Jan Kara
@ 2014-12-02 15:18 ` Johannes Weiner
0 siblings, 0 replies; 9+ messages in thread
From: Johannes Weiner @ 2014-12-02 15:18 UTC (permalink / raw)
To: Jan Kara
Cc: Andrew Morton, Tejun Heo, Hugh Dickins, Michel Lespinasse,
linux-mm, linux-fsdevel, linux-kernel
On Tue, Dec 02, 2014 at 09:58:25AM +0100, Jan Kara wrote:
> On Mon 01-12-14 17:58:01, Johannes Weiner wrote:
> > The only way a VMA can have shared and writable semantics is with a
> > backing file.
> OK, one always learns :) After some digging I found that MAP_SHARED |
> MAP_ANONYMOUS mappings are in fact mappings of a temporary file in tmpfs.
> It would be worth to mention this in the changelog I believe. Otherwise
> feel free to add:
> Reviewed-by: Jan Kara <jack@suse.cz>
Thanks, Jan. I updated the changelog to read:
Shared anonymous mmaps are implemented with shmem files, so all VMAs
with shared writable semantics also have an underlying backing file.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas
2014-12-01 22:58 ` [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas Johannes Weiner
2014-12-02 8:58 ` Jan Kara
@ 2014-12-02 12:00 ` Kirill A. Shutemov
1 sibling, 0 replies; 9+ messages in thread
From: Kirill A. Shutemov @ 2014-12-02 12:00 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Tejun Heo, Hugh Dickins, Michel Lespinasse,
Jan Kara, linux-mm, linux-fsdevel, linux-kernel
On Mon, Dec 01, 2014 at 05:58:01PM -0500, Johannes Weiner wrote:
> The only way a VMA can have shared and writable semantics is with a
> backing file.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-12-16 16:18 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-16 16:18 [patch 0/3 resend] mm: close race between dirtying and truncation Johannes Weiner
2014-12-16 16:18 ` [patch 1/3] mm: protect set_page_dirty() from ongoing truncation Johannes Weiner
2014-12-16 16:18 ` [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas Johannes Weiner
2014-12-16 16:18 ` [patch 3/3] mm: memory: merge shared-writable dirtying branches in do_wp_page() Johannes Weiner
-- strict thread matches above, loose matches on Subject: below --
2014-12-05 14:52 [patch 1/3] mm: protect set_page_dirty() from ongoing truncation Johannes Weiner
2014-12-05 14:52 ` [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas Johannes Weiner
2014-12-01 22:58 [patch 1/3] mm: protect set_page_dirty() from ongoing truncation Johannes Weiner
2014-12-01 22:58 ` [patch 2/3] mm: memory: remove ->vm_file check on shared writable vmas Johannes Weiner
2014-12-02 8:58 ` Jan Kara
2014-12-02 15:18 ` Johannes Weiner
2014-12-02 12:00 ` Kirill A. Shutemov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox